investigate using RFC 6465 algorithm for audio level calculation / speaking events #6

fippo · 2014-02-09T20:32:44Z

I suspect the current max(freq) strategy for is somewhat unstable, since it just takes the maximum and ignores the frequency.

getByteTimeDomainData may enable us to calculate root mean square according to http://tools.ietf.org/html/rfc6465#appendix-A.1
If that doesn't work... we'll have to figure out something with the FFT data.

jokesterfr · 2014-06-04T14:32:52Z

I agree with this, do you have something new regarding a more efficient vad technic, involving frequency ranges? I read human voice is often between [100Hz - 1000Hz]

fippo · 2014-06-04T14:45:25Z

http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_June2013.pdf is what I would currently prefer (not the dominating speaker aspect, but the others). But time...

jokesterfr · 2014-06-12T14:13:01Z

That's seems too complicated for me to help. I mean, I'm a developer, not a PHD researcher in vocal recognition. However if you already know about some implementations of a good VAD script, in what ever language on earth it is, I can give it a try.

fippo self-assigned this Feb 9, 2014

fippo removed their assignment Jun 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

investigate using RFC 6465 algorithm for audio level calculation / speaking events #6

investigate using RFC 6465 algorithm for audio level calculation / speaking events #6

fippo commented Feb 9, 2014

jokesterfr commented Jun 4, 2014

fippo commented Jun 4, 2014

jokesterfr commented Jun 12, 2014

investigate using RFC 6465 algorithm for audio level calculation / speaking events #6

investigate using RFC 6465 algorithm for audio level calculation / speaking events #6

Comments

fippo commented Feb 9, 2014

jokesterfr commented Jun 4, 2014

fippo commented Jun 4, 2014

jokesterfr commented Jun 12, 2014