Skip to content

This speech separation based framework is for multi-talker keyword spotting tasks and is implemented in the ESPnet2 toolkit.

Notifications You must be signed in to change notification settings

GnafiY/TPDT-SS-KWS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Overview

License arXiv

This is the official repository for Interspeech 2024 paper Text-aware Speech Separation for Multi-talker Keyword Spotting. The implementaion of the front-end model is based on ESPnet. All unused examples in egs and egs2 are removed. As for the KWS backend, We directly apply the default setup of MDTC from WeKws examples/hey_snips/s0.

I apologize that the email address of the primary author is wrong, which should be [email protected] instead of [email protected]. Feel free to mail to me if you have any question!

Setup

  1. Clone this repository.
  2. Install ESPnet dependencies, please refer to ESPnet official repository.
  3. Change directory to espnet/egs2/librimix/enh1.
  4. Generate Libri2Mix scp data by running bash run.sh --stage 1 --stop_stage 4.
  5. Generate Snips2Mix data with instruction in local/Snips2Mix.
  6. Train and run inference by bash run.sh --stage 5 --stop_stage 6 and bash run.sh --stage 7 --stop_stage 8, respectively.
  7. If you wish to run KWS inference, please refer to the snips recipe in WeKws.

About

This speech separation based framework is for multi-talker keyword spotting tasks and is implemented in the ESPnet2 toolkit.

Topics

Resources

Stars

Watchers

Forks