Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate using RFC 6465 algorithm for audio level calculation / speaking events #6

Open
fippo opened this issue Feb 9, 2014 · 3 comments

Comments

@fippo
Copy link
Member

fippo commented Feb 9, 2014

I suspect the current max(freq) strategy for is somewhat unstable, since it just takes the maximum and ignores the frequency.

getByteTimeDomainData may enable us to calculate root mean square according to http://tools.ietf.org/html/rfc6465#appendix-A.1
If that doesn't work... we'll have to figure out something with the FFT data.

@fippo fippo self-assigned this Feb 9, 2014
@jokesterfr
Copy link

I agree with this, do you have something new regarding a more efficient vad technic, involving frequency ranges? I read human voice is often between [100Hz - 1000Hz]

@fippo
Copy link
Member Author

fippo commented Jun 4, 2014

http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_June2013.pdf is what I would currently prefer (not the dominating speaker aspect, but the others). But time...

@fippo fippo removed their assignment Jun 11, 2014
@jokesterfr
Copy link

That's seems too complicated for me to help. I mean, I'm a developer, not a PHD researcher in vocal recognition. However if you already know about some implementations of a good VAD script, in what ever language on earth it is, I can give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants