add timestamps for each word #113

merouanezouaid · 2025-01-31T12:52:53Z

merouanezouaid
Jan 31, 2025

I would like to have timestamps for each word in the generated text-to-speech output. This would improve the accuracy of syncing the audio with other media.

I could also submit this as a PR if I get some guidance.

remsky · 2025-01-31T14:00:05Z

remsky
Jan 31, 2025
Maintainer

For sure! I was planning on jumping on it once I finished the v1_0 integrations (the structure may change somewhat for those models anyhow). But you can take a look at the stale branch I was using to experiment with it a bit.

You can get the pred_dur from the pytorch versions (not sure how you'd do it with onnx tbh), and then matching that back through the phonemes/tokens back to words. Was a bit tricky with the sampling and scaling/etc which is where I left it

https://github.com/remsky/Kokoro-FastAPI/tree/v0.1.2-pre-experimental-subs

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add timestamps for each word #113

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

add timestamps for each word #113

merouanezouaid Jan 31, 2025

Replies: 1 comment

remsky Jan 31, 2025 Maintainer

merouanezouaid
Jan 31, 2025

remsky
Jan 31, 2025
Maintainer