This is the official repository for Interspeech 2024 paper Text-aware Speech Separation for Multi-talker Keyword Spotting. The implementaion of the front-end model is based on ESPnet. All unused examples in egs
and egs2
are removed. As for the KWS backend, We directly apply the default setup of MDTC from WeKws examples/hey_snips/s0
.
I apologize that the email address of the primary author is wrong, which should be [email protected] instead of [email protected]. Feel free to mail to me if you have any question!
- Clone this repository.
- Install ESPnet dependencies, please refer to ESPnet official repository.
- Change directory to
espnet/egs2/librimix/enh1
. - Generate Libri2Mix scp data by running
bash run.sh --stage 1 --stop_stage 4
. - Generate Snips2Mix data with instruction in
local/Snips2Mix
. - Train and run inference by
bash run.sh --stage 5 --stop_stage 6
andbash run.sh --stage 7 --stop_stage 8
, respectively. - If you wish to run KWS inference, please refer to the snips recipe in WeKws.