Skip to content

eginhard/monotonic_alignment_search

Repository files navigation

Monotonic Alignment Search (MAS)

PyPI - License PyPI - Python Version PyPI - Version GithubActions GithubActions

Implementation of MAS from Glow-TTS for easy reuse in other projects.

Installation

pip install monotonic-alignment-search

Wheels are provided for Linux, Mac, and Windows. Pytorch is not installed by default. You either first need to install it yourself, or install one of the following extras with uv:

uv add monotonic-alignment-search[cpu]
uv add monotonic-alignment-search[cuda]

Usage

MAS can find the most probable alignment between a text sequence t_x and a speech sequence t_y.

from monotonic_alignment_search import maximum_path

# value (torch.Tensor): [batch_size, t_x, t_y]
# mask  (torch.Tensor): [batch_size, t_x, t_y]
path = maximum_path(value, mask, implementation="cython")

The implementation argument allows choosing from one of the following implementations:

  • cython (default): Cython-optimised
  • numpy: pure Numpy

References

This implementation is taken from the original Glow-TTS repository. Consider citing the Glow-TTS paper when using this project:

@inproceedings{kim2020_glowtts,
    title={Glow-{TTS}: A Generative Flow for Text-to-Speech via Monotonic Alignment Search},
    author={Jaehyeon Kim and Sungwon Kim and Jungil Kong and Sungroh Yoon},
    booktitle={Proceedings of Neur{IPS}},
    year={2020},
}

About

Monotonically align text and speech

Topics

Resources

License

Stars

Watchers

Forks