You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does it represent the minimum length of each speech? Why do you want to send each segment of speech into training? Is it the reason for memory resources?If my audio is about 10-15s, does it cause my model to generate meaningless audio?
The text was updated successfully, but these errors were encountered:
--Does it represent the minimum length of each speech?
--No, all input waves are cropped to length 7680. In input queue, a wave segment of length 7680 is randomly cropped from a longer input wave.
--Why do you want to send each segment of speech into training? Is it the reason for memory resources?
--Yes, longer wave segments consume much more gpu memory.
--If my audio is about 10-15s, does it cause my model to generate meaningless audio?
--No, the model trained on waves of length 7680 generalizes well on longer sequences.
Does it represent the minimum length of each speech? Why do you want to send each segment of speech into training? Is it the reason for memory resources?If my audio is about 10-15s, does it cause my model to generate meaningless audio?
The text was updated successfully, but these errors were encountered: