The collection of scripts in this repository represent a template for training neural networks via Multi-Task Learning in Kaldi. This repo is heavily based on the existing Kaldi multilingual Babel example directory.
multi-task-kaldi
allows similar functionality to the multilingual Babel scripts, but with more easily extendable code. Adding a new language with multi-task-kaldi
is as easy as creating a new input_lang
dir. Running multiple tasks on the same corpus is not possible in the multilingual Babel setup, but in multi-task-kaldi
it is possible by creating a new input_task
dir. The code here aims to be easily readable and extensible, and makes few assumptions about the kind of data you have and where it's located on disk.
To get started, multi-task-kaldi
should be cloned and moved into the egs
dir of your local version of the latest Kaldi branch.
If you're used to typical Kaldi egs
, you should know that all the scripts here in utils
/ local
/ steps
exist in this repo. That is, they do not link back to the wsj
example. This was done to make custom changes to the scripts, making them more readable.
In order to run multi-task-kaldi
, you need to make a new input_task
dir. This is the only place you need to make changes for your new task (or new language).
This directory contains information about the location of your data, lexicon, language model.
Here is an example of the structure of my input_task
directory for the task called my-task
.
input_my-task/
├── lexicon_nosil.txt -> /data/my-task/lexicon/lexicon_nosil.txt
├── lexicon.txt -> /data/my-task/lexicon/lexicon.txt
├── task.arpabo -> /data/my-task/lm/task.arpabo
├── test_audio_path -> /data/my-task/audio/test_audio_path
├── train_audio_path -> /data/my-task/audio/train_audio_path
├── transcripts.test -> /data/my-task/audio/transcripts.test
└── transcripts.train -> /data/my-task/audio/transcripts.train
0 directories, 7 files
Most of these files are standard Kaldi format, and more detailed descriptions of them can be found on the official docs.
lexicon_nosil.txt
// Standard Kaldi // phonetic dictionary without silence phonemeslexicon.txt
// Standard Kaldi // phonetic dictionary with silence phonemestask.arpabo
// Standard Kaldi // language model in ARPA back-off formattest_audio_path
// Custom file! // one-line text file containing absolute path to dir of audio files (eg. WAV) for testingtrain_audio_path
// Custom file! // one-line text file containing absolute path to dir of audio files (eg. WAV) for trainingtranscripts.test
// Custom file! // A typical Kaldi transcript file, but with only the test utterancestranscripts.train
// Custom file! // A typical Kaldi transcript file, but with only the train utterances
The scripts will name files and directories dynamically. You will define the name of your input data (ie. task or language) in the initial input_
dir, and then the rest of the generated dirs and files will be named accordingly. For instance, if you have input_your-task
, then the GMM alignment stage will create data_your-task
, plp_your-task
and exp_your-task
.
$ ./run_gmm.sh your-task test001
-
your-task
should correspond exactly toinput_your-task
. In multilingual training, this will beinput_lang1
,input_lang2
, etc. In monolingual Multi-Task Learning, this will beinput_task1
,input_task2
, etc. -
test001
is any character string, and is written to the name of the WER file:WER_nnet3_your-corpus_test001.txt
$ ./utils/setup_multitask.sh to_dir from_dir "your-task1 your-task2 your-task3"
-
all
nnet3
log files and experimental data will be written toto_dir
(absolute path). This dir must exist already. -
the output dirs from GMM alignment should exist at
from_dir
(absolute path) -
the task names
"your-task1 your-task2 your-task3"
must correspond to input dir names as such:input_your-task1
,input_your-task2
, etc. However, do not include the initialinput_
here.
$ ./run_nnet3_multitask.sh "your-task1 your-task2" "gmm-typo1 gmm-typo2" "weight-task1,weight-task2" hidden-dim num-epochs main-dir
-
first argument is a space-delimited string of task names (must correspond to
input_your-task1
) -
second argument is a space-delimited string of GMM model typologies. These are either "mono" or "tri", and determine whether you want to use monophone alignments or triphone alignments for each task.
-
third argument is comma-delimited list of weights for each task. Should be probably equal to or less than
1.0
. -
hidden-dim
is the number of nodes in your hidden layer -
num-epochs
is num epochs for each task. This is not task-specific. -
main-dir
is the dir you moved your GMM alignments into. Above we usedto_dir
.