add timestamps for each word #113
merouanezouaid
started this conversation in
General
Replies: 1 comment
-
For sure! I was planning on jumping on it once I finished the v1_0 integrations (the structure may change somewhat for those models anyhow). But you can take a look at the stale branch I was using to experiment with it a bit. You can get the pred_dur from the pytorch versions (not sure how you'd do it with onnx tbh), and then matching that back through the phonemes/tokens back to words. Was a bit tricky with the sampling and scaling/etc which is where I left it https://github.com/remsky/Kokoro-FastAPI/tree/v0.1.2-pre-experimental-subs |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I would like to have timestamps for each word in the generated text-to-speech output. This would improve the accuracy of syncing the audio with other media.
I could also submit this as a PR if I get some guidance.
Beta Was this translation helpful? Give feedback.
All reactions