You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I suspect the current max(freq) strategy for is somewhat unstable, since it just takes the maximum and ignores the frequency.
getByteTimeDomainData may enable us to calculate root mean square according to http://tools.ietf.org/html/rfc6465#appendix-A.1
If that doesn't work... we'll have to figure out something with the FFT data.
The text was updated successfully, but these errors were encountered:
I agree with this, do you have something new regarding a more efficient vad technic, involving frequency ranges? I read human voice is often between [100Hz - 1000Hz]
That's seems too complicated for me to help. I mean, I'm a developer, not a PHD researcher in vocal recognition. However if you already know about some implementations of a good VAD script, in what ever language on earth it is, I can give it a try.
I suspect the current max(freq) strategy for is somewhat unstable, since it just takes the maximum and ignores the frequency.
getByteTimeDomainData may enable us to calculate root mean square according to http://tools.ietf.org/html/rfc6465#appendix-A.1
If that doesn't work... we'll have to figure out something with the FFT data.
The text was updated successfully, but these errors were encountered: