Implementation of MAS from Glow-TTS for easy reuse in other projects.
pip install monotonic-alignment-search
Wheels are provided for Linux, Mac, and Windows. Pytorch is not installed by default. You either first need to install it yourself, or install one of the following extras with uv:
uv add monotonic-alignment-search[cpu]
uv add monotonic-alignment-search[cuda]
MAS can find the most probable alignment between a text sequence t_x
and a
speech sequence t_y
.
from monotonic_alignment_search import maximum_path
# value (torch.Tensor): [batch_size, t_x, t_y]
# mask (torch.Tensor): [batch_size, t_x, t_y]
path = maximum_path(value, mask, implementation="cython")
The implementation
argument allows choosing from one of the following
implementations:
cython
(default): Cython-optimisednumpy
: pure Numpy
This implementation is taken from the original Glow-TTS repository. Consider citing the Glow-TTS paper when using this project:
@inproceedings{kim2020_glowtts,
title={Glow-{TTS}: A Generative Flow for Text-to-Speech via Monotonic Alignment Search},
author={Jaehyeon Kim and Sungwon Kim and Jungil Kong and Sungroh Yoon},
booktitle={Proceedings of Neur{IPS}},
year={2020},
}