You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attaching an audio file (it's reproducible with longer files split into chunks).
Disabling VAD helps but it does not explain the issue because VAD correctly identifies where speech stars (around 2.5 seconds).
It affects both batch and non-batch methods.
With VAD:
chunks_metadata [{'start_time': 2.416, 'end_time': 12.72}]
duration_after_vad 10.304
Sentence: [0 7.83s -> 12.13s] It's important that that first piece can't be misinterpreted as a decimal.
Without VAD:
chunks_metadata [{'start_time': 0.0, 'end_time': 13.11925}]
duration_after_vad 13.11925
Sentence: [0 3.42s -> 12.14s] 8892. It's important that that first piece can't be misinterpreted as a decimal.
I'm attaching an audio file (it's reproducible with longer files split into chunks).
Disabling VAD helps but it does not explain the issue because VAD correctly identifies where speech stars (around 2.5 seconds).
It affects both batch and non-batch methods.
With VAD:
chunks_metadata [{'start_time': 2.416, 'end_time': 12.72}]
duration_after_vad 10.304
Sentence: [0 7.83s -> 12.13s] It's important that that first piece can't be misinterpreted as a decimal.
Without VAD:
chunks_metadata [{'start_time': 0.0, 'end_time': 13.11925}]
duration_after_vad 13.11925
Sentence: [0 3.42s -> 12.14s] 8892. It's important that that first piece can't be misinterpreted as a decimal.
digit-speech.zip
The text was updated successfully, but these errors were encountered: