Voice-Activity-Detection

We train a neural network to detect activity of human speech in an audio frame.

The audio dataset can be found here at http://www.openslr.org/resources.php

A soft VAD value is computed and used from the speech signal and some random mixtures with chunks of noise signals for each frame of speech data.

Take Speech frames and extract Log spectrogram or MFCCs to get the features ready to be used as inputs for the neural network.

Build (a) Feedforward Model (b) sequence model ( Bidirectional LSTMs)

The model has been trained with MSE Cost function, ReLU output layer and RMSProp optimization for approximately 20-50 epochs and a dropout and decayed learning rate alongside 10% validation error.

Preprocessings involved:

Download and extract time series waveforms from '.flac' audio files
File-wise extract and store binary files (.mfcc)
Mixing random frames of noise (download and extract from https://zenodo.org/record/1227121#.W23BI9VKiUm ) with clean audio signals to get time-series vectors

TODO:

Better Hyperparameter optimization.
Explore more complex topologies.
Use better strategies to come up with better generalizations

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
NN_main.py		NN_main.py
README.md		README.md
feature_extract.py		feature_extract.py
how_to_VAD_linux.ipynb		how_to_VAD_linux.ipynb
io.py		io.py
main.py		main.py
math_helper.py		math_helper.py
process_noise.py		process_noise.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-Activity-Detection

About

Releases

Packages

Languages

kulka193/Voice-Activity-Detection

Folders and files

Latest commit

History

Repository files navigation

Voice-Activity-Detection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages