Are sent and word duration loss necessary for unsupervised alignment ? #11

xiaoyangnihao · 2022-04-27T13:46:28Z

Are sent and word duration loss necessary for unsupervised alignment for a robust duration prediction?

keonlee9420 · 2022-05-01T04:09:14Z

Hi @xiaoyangnihao , I’m not sure about the robustness, but it’s working for the correctness(accuracy) of the pause and hence naturalness. The effect is maximized when your dataset has complex punctuation rules.

xiaoyangnihao · 2022-05-11T14:06:14Z

Hi @xiaoyangnihao , I’m not sure about the robustness, but it’s working for the correctness(accuracy) of the pause and hence naturalness. The effect is maximized when your dataset has complex punctuation rules.

Thanks for your replay. By the way, in paper: "One TTS Alignment To Rule Them All", align modue use encoder outputs and mel as input for alignment, btw in your repo, align model use text_embedding as mel as inputs, have you done an experiment to compare this diff ?

keonlee9420 · 2022-05-27T01:47:17Z

I just followed the Nemo's implementation, and I guess there is no specific reason for that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are sent and word duration loss necessary for unsupervised alignment ? #11

Are sent and word duration loss necessary for unsupervised alignment ? #11

xiaoyangnihao commented Apr 27, 2022

keonlee9420 commented May 1, 2022

xiaoyangnihao commented May 11, 2022 •

edited

Loading

keonlee9420 commented May 27, 2022

Are sent and word duration loss necessary for unsupervised alignment ? #11

Are sent and word duration loss necessary for unsupervised alignment ? #11

Comments

xiaoyangnihao commented Apr 27, 2022

keonlee9420 commented May 1, 2022

xiaoyangnihao commented May 11, 2022 • edited Loading

keonlee9420 commented May 27, 2022

xiaoyangnihao commented May 11, 2022 •

edited

Loading