Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
a4f38bd
add tokenizer_interface
poonehmousavi Nov 5, 2024
0c2b751
add reactored version of ASR
poonehmousavi Nov 6, 2024
17898c3
fix precommit
poonehmousavi Nov 8, 2024
db1590e
fix flake
poonehmousavi Nov 8, 2024
3361ac6
fix blank index
poonehmousavi Nov 8, 2024
2678a24
add tokens extraction / loading
Chaanks Nov 9, 2024
0694249
update tokens extraction script
Chaanks Nov 9, 2024
2c30ade
update tokens extraction script
Chaanks Nov 11, 2024
336dd64
update LibriSpeech ASR recipe
Chaanks Nov 12, 2024
cf40412
update LibriSpeech ASR recipe
Chaanks Dec 2, 2024
973e12b
change name
poonehmousavi Dec 20, 2024
8dca49d
add discrete_ssl, reorgnaize folder
poonehmousavi Dec 23, 2024
e317d3a
clean code and fix speechtokenzier bug
poonehmousavi Dec 23, 2024
fcb5209
fix discrete_ssl bug
poonehmousavi Dec 23, 2024
0d575d4
fix bug
poonehmousavi Dec 23, 2024
447844c
fix bug
poonehmousavi Dec 24, 2024
8aeaeb9
fix discrete_ssl train.py for specifiying which layer to use
poonehmousavi Dec 24, 2024
c831e60
fix discrete_ssl
poonehmousavi Dec 24, 2024
ecf761a
fix bug introduced in last commit
poonehmousavi Dec 24, 2024
0d2e309
fix bug in saving pretrained embedding
poonehmousavi Dec 24, 2024
4729007
fix
poonehmousavi Dec 24, 2024
7a0ecc2
fix bug intriduced in prev commit
poonehmousavi Dec 24, 2024
73dfa4d
fix bug for saveing embeedng
poonehmousavi Dec 24, 2024
a9e8f3b
add vocab_size to encodec
poonehmousavi Dec 24, 2024
4237bac
fix bug
poonehmousavi Dec 24, 2024
867228e
fix embedding loading for train.py
poonehmousavi Dec 24, 2024
3570b63
fix precommit
poonehmousavi Dec 24, 2024
3ef9964
move tokenizer_interface to util
poonehmousavi Dec 24, 2024
ca05ac6
update extract doc and comments and set to highest bitrate
poonehmousavi Dec 24, 2024
a08891e
add run_script.sh
poonehmousavi Dec 24, 2024
d41c6e4
fix run_experiments.sh bug
poonehmousavi Dec 24, 2024
04ea1e6
add bash script for token extraction
poonehmousavi Dec 24, 2024
95333cf
fix bug
poonehmousavi Dec 24, 2024
096fc43
add hyperparam tuning
poonehmousavi Dec 25, 2024
8dc0161
fix precommit
poonehmousavi Dec 25, 2024
c0f4fee
modify hparams.sh input order
poonehmousavi Dec 25, 2024
a595cf6
only applying testing for final run HT
poonehmousavi Dec 25, 2024
78da6c1
fix bug
poonehmousavi Dec 26, 2024
6a3a7a5
fix bug
poonehmousavi Dec 26, 2024
e9ff250
add hupertun for contextnet
poonehmousavi Dec 26, 2024
3e2fe0c
add etsting to average run
poonehmousavi Dec 26, 2024
f378aec
add lr for HT for contextnet
poonehmousavi Dec 26, 2024
80238bc
Merge branch 'DASB-refactor' of https://github.com/Chaanks/benchmarks…
poonehmousavi Dec 26, 2024
b2bd316
add measuring time
poonehmousavi Dec 26, 2024
9de6934
add time measure
poonehmousavi Dec 26, 2024
c4e2738
update readme + minor changes
poonehmousavi Dec 28, 2024
279e48b
fix link in readme
poonehmousavi Dec 28, 2024
7f32f1b
update table of contnet
poonehmousavi Dec 28, 2024
30fc2d6
fix
poonehmousavi Dec 28, 2024
a576ba7
fix
poonehmousavi Dec 28, 2024
7c75515
Merge pull request #47 from Chaanks/DASB-refactor
poonehmousavi Dec 28, 2024
c96eefb
Add Common Voice tokenization
ana-kuznetsova Jan 10, 2025
bd9e953
Add linear model implementation
ana-kuznetsova Jan 15, 2025
9426cea
upd .gitignore
ana-kuznetsova Jan 15, 2025
5835ade
adapt CV recipe
ana-kuznetsova Jan 16, 2025
e767ebc
adapt cv for offline extraction
ana-kuznetsova Jan 16, 2025
eb2e692
adapt cv for offline extraction
ana-kuznetsova Jan 16, 2025
c163db8
Merge branch 'speechbrain:main' into DASB-CV
ana-kuznetsova Jan 18, 2025
b1d08dd
add lstm config
ana-kuznetsova Jan 19, 2025
8edf06c
Merge branch 'DASB-CV' of github.com:ana-kuznetsova/benchmarks into D…
ana-kuznetsova Jan 19, 2025
6a094d2
fix lr in lstm config
ana-kuznetsova Jan 19, 2025
6b14316
fix lr in lstm config
ana-kuznetsova Jan 19, 2025
768b9a0
upd .gitignore
ana-kuznetsova Jan 20, 2025
a1ca0ca
fix filtering in CV extract
ana-kuznetsova Jan 20, 2025
8662a2a
adjust cv prepare pipeline
ana-kuznetsova Jan 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ htmlcov/
.coverage
.coverage.*
.cache
cache/
ASR-cv*
nosetests.xml
coverage.xml
*.cover
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The SpeechBrain Benchmarks currently include the following:

- [MOABB](https://github.com/speechbrain/benchmarks/tree/main/benchmarks/MOABB) - A benchmark designed for evaluating neural models in well-known EEG tasks like motor imagery, P300, and SSVEP.

- [DASB](https://github.com/speechbrain/benchmarks/tree/main/benchmarks/DASB) - A benchmark designed for evaluating discrete audio tokens across a wide range of discriminative
- [DASB](https://github.com/speechbrain/benchmarks/tree/DASB/benchmarks/DASB) - A benchmark designed for evaluating discrete audio tokens across a wide range of discriminative
and generative tasks.


Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# ################################
# Recipe for training an discrete-input ctc ASR system with librispeech.
# Decoding is performed with ctc greedy or LM-rescored decoder.
# Script for training an ASR model evaluating an SSL representation
# model on one language from the CommonVoice dataset. A SentencePiece tokenizer
# with number of tokens equal to <output_neurons> is learned in a first phase
# on the considered language.
#
# Authors
# * Pooneh Mousavi 2024
Expand All @@ -9,69 +11,100 @@
# Seed needs to be set at top of yaml, before objects with parameters are made
seed: 1986
__set_seed: !apply:torch.manual_seed [!ref <seed>]
output_folder: !ref results/MP3S-LSTM/speech_tokenizer/<seed>
output_wer_folder: !ref <output_folder>/
language: cy # use 'cy' for Welsh and 'eu' for Basque
output_folder: !ref results/CommonVoice/speech_tokenizer/<language>/<seed>
test_wer_file: !ref <output_folder>/wer_test.txt
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

cached_data_folder: cache/CommonVoice/<language>/LSTM/speech_tokenizer/<seed>
run_name: !PLACEHOLDER

# Data files
data_folder: !PLACEHOLDER # e,g./path/to/LibriSpeech
# noise/ris dataset will automatically be downloaded
# data_folder_rirs: !ref <data_folder>
train_splits: ["train-clean-100"]
dev_splits: ["dev-clean"]
test_splits: ["test-clean", "test-other"]

skip_prep: False
ckpt_interval_minutes: 25 # save checkpoint every N min
train_csv: !ref <output_folder>/train-clean-100.csv
valid_csv: !ref <output_folder>/dev-clean.csv
test_csv:
- !ref <output_folder>/test-clean.csv
- !ref <output_folder>/test-other.csv

data_folder: !PLACEHOLDER # e.g, /local/cv-corpus-11.0-2022-09-21/<language>
train_tsv_file: !ref <data_folder>/train.tsv # Standard CommonVoice .tsv files
dev_tsv_file: !ref <data_folder>/dev.tsv # Standard CommonVoice .tsv files
test_tsv_file: !ref <data_folder>/test.tsv # Standard CommonVoice .tsv files
accented_letters: True
train_csv: !ref <save_folder>/train.csv
valid_csv: !ref <save_folder>/dev.csv
test_csv: !ref <save_folder>/test.csv
skip_prep: False # Skip data preparation
testing: True # If set to True, the test evlaution is done, otherwise skipped.

tokens_folder: !PLACEHOLDER # Path to the folder where extracted tokens are saved.
pretrain_embeddings_folder: non

avoid_if_longer_than: 10.0

# Training parameters
number_of_epochs: 20
lr: 0.0002
sorting: ascending
precision: fp32

# With data_parallel batch_size is split into N jobs
# With DDP batch_size is multiplied by N jobs
# Must be 3 per GPU to fit 32GB of VRAM
batch_size: 4

batch_size_exponent: 4 # @orion_step1: --batch_size_exponent~"uniform(2, 4,discrete=True)"
batch_size: !ref 2 ** <batch_size_exponent>
test_batch_size: 1
grad_accumulation_factor: 2
max_grad_norm: 5.0


### Config for Tokenizer
vocab_size: 1024
num_codebooks: 2
sample_rate: 16000
sorting: descending #random
num_workers: 8
loss_reduction: batchmean
precision: fp32 # bf16, fp16 or fp32loss_reduction: batchmean
valid_search_interval: 1
avg_checkpoints: 10 # Number of checkpoints to average for evaluation
cache_size: 1.e+10
token_type: bpe # ["unigram", "bpe", "char"]
character_coverage: 1.0

# Feature parameters
lr_model: 0.0002 # @orion_step1: --lr_model~"loguniform(0.00001,0.5)"

# Training parameters
dynamic_batching: True
max_batch_length_train: 850
max_batch_len_val: 100
num_bucket: 200
shuffle: False # if true re-creates batches at each epoch shuffling examples.
max_batch_ex: 128
batch_ordering: random

dynamic_batch_sampler_train:
max_batch_length: !ref <max_batch_length_train>
num_buckets: !ref <num_bucket>
shuffle: !ref <shuffle>
batch_ordering: !ref <batch_ordering>
max_batch_ex: !ref <max_batch_ex>

dynamic_batch_sampler_val:
max_batch_length: !ref <max_batch_len_val>
num_buckets: !ref <num_bucket>
shuffle: !ref <shuffle>
batch_ordering: !ref <batch_ordering>
max_batch_ex: !ref <max_batch_ex>

encoder_dim: 1024

# Dataloader options
train_dataloader_opts:
batch_size: !ref <batch_size>
dataloader_options:
batch_size: !ref <batch_size>
num_workers: 4
test_dataloader_options:
batch_size: !ref <test_batch_size>
num_workers: 4


valid_dataloader_opts:
batch_size: !ref <batch_size>

test_dataloader_opts:
batch_size: !ref <test_batch_size>

# Model parameters

activation: !name:torch.nn.Sigmoid
dnn_layers: 1
dnn_neurons: 1024
dnn_neurons: 768
freeze_encoder: True

# Outputs
output_neurons: 30 # BPE size, index(blank/eos/bos) = 0
output_neurons: 100 # BPE size, index(blank/eos/bos) = 0

# Decoding parameters
blank_index: 0
Expand All @@ -92,16 +125,20 @@ test_beam_search:
# If you don't want to use an LM, comment it out or set it to null
kenlm_model_path: null

### Config for Tokenizer
vocab_size: 1024
num_codebooks: 2
sample_rate: 16000

# Feature parameters
encoder_dim: 1024

# Functions and classes
#
epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
limit: !ref <number_of_epochs>

# EnCodec model (see https://huggingface.co/docs/transformers/v4.31.0/en/model_doc/encodec)
# EnCodec model (see https://huggingface.co/docs/transformers/v4.31.0/en/model_doc/encodec)
codec: !new:speechbrain.lobes.models.discrete.speechtokenizer_interface.SpeechTokenizer_interface
source: fnlp/SpeechTokenizer # Only the 24kHz version supports mono audio
save_path: !ref <save_folder>
# Modules
discrete_embedding_layer: !new:custom_model.Discrete_EmbeddingLayer
num_codebooks: !ref <num_codebooks>
vocab_size: !ref <vocab_size>
Expand All @@ -111,6 +148,7 @@ attention_mlp: !new:custom_model.AttentionMLP
input_dim: !ref <encoder_dim>
hidden_dim: !ref <encoder_dim>


enc: !new:speechbrain.nnet.RNN.LSTM
input_shape: [Null, Null, !ref <encoder_dim>]
num_layers: 2
Expand All @@ -132,17 +170,16 @@ modules:
enc: !ref <enc>
ctc_lin: !ref <ctc_lin>
attention_mlp: !ref <attention_mlp>
codec: !ref <codec>
discrete_embedding_layer: !ref <discrete_embedding_layer>

model: !new:torch.nn.ModuleList
- [!ref <enc>, !ref <ctc_lin>, !ref <discrete_embedding_layer>, !ref <attention_mlp>]

model_opt_class: !name:torch.optim.Adam
lr: !ref <lr>
lr: !ref <lr_model>

lr_annealing_model: !new:speechbrain.nnet.schedulers.NewBobScheduler
initial_value: !ref <lr>
initial_value: !ref <lr_model>
improvement_threshold: 0.0025
annealing_factor: 0.8
patient: 0
Expand All @@ -155,7 +192,6 @@ checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
model: !ref <model>
scheduler_model: !ref <lr_annealing_model>
attention_mlp: !ref <attention_mlp>
codec: !ref <codec>
discrete_embedding_layer: !ref <discrete_embedding_layer>
counter: !ref <epoch_counter>
tokenizer: !ref <label_encoder>
Expand Down
6 changes: 6 additions & 0 deletions benchmarks/DASB/CommonVoice/ASR/LSTM/hparams/train_dac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ train_log: !ref <output_folder>/train_log.txt

# Data files
data_folder: !PLACEHOLDER # e.g, /local/cv-corpus-11.0-2022-09-21/<language>
cached_data_folder: !PLACEHOLDER # e.g., path/to/cache
train_tsv_file: !ref <data_folder>/train.tsv # Standard CommonVoice .tsv files
dev_tsv_file: !ref <data_folder>/dev.tsv # Standard CommonVoice .tsv files
test_tsv_file: !ref <data_folder>/test.tsv # Standard CommonVoice .tsv files
Expand All @@ -28,6 +29,9 @@ valid_csv: !ref <save_folder>/dev.csv
test_csv: !ref <save_folder>/test.csv
skip_prep: False # Skip data preparation

tokens_folder: !PLACEHOLDER # Path to the folder where extracted tokens are saved.
pretrain_embeddings_folder: none # Optional: If pretrain_embeddings is True, this should be set to the path where the pretrained embeddings are saved.

avoid_if_longer_than: 10.0

# Training parameters
Expand Down Expand Up @@ -97,6 +101,8 @@ vocab_size: 1024
model_bitrate: 8kbps
num_codebooks: 2 # NOTE: must be smaller or equal to the maximum number of codebooks for the given model type
sample_rate: 24000
pretrain_embeddings: False
freeze_embedding: False

# Feature parameters
encoder_dim: 1024
Expand Down
Loading
Loading