SpeechLM Update #12430

stevehuang52 · 2025-02-28T20:17:19Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

A couple of important updates in SpeechLM

Collection: [speechlm]

Changelog

Unified various input format into multimodal conversation, where audio and text are interleaved.
Added Whisper encoder support.
Fixed issues with PEFT not saving batch norm stats.
Fixed loading PEFT ckpt for inference.
Context parallel support in LLM. Speech encoder still cannot.
Various improvements and refactoring.

Signed-off-by: stevehuang52 <[email protected]>

add type hint Signed-off-by: He Huang (Steve) <[email protected]>

Signed-off-by: stevehuang52 <[email protected]>

Signed-off-by: He Huang (Steve) <[email protected]>

Signed-off-by: stevehuang52 <[email protected]>

Signed-off-by: He Huang (Steve) <[email protected]>

Signed-off-by: stevehuang52 <[email protected]>

Signed-off-by: artbataev <[email protected]>

Signed-off-by: stevehuang52 <[email protected]>

nemo/collections/speechlm/data/dataset/audio_text_lhotse_dataset.py

+    input_ids = [sample.input_ids for sample in samples]
+    context_ids = [sample.context_ids for sample in samples]
+    context_lengths = [sample.context_length for sample in samples]
+    answer_ids = [sample.answer_ids for sample in samples]


nemo/collections/speechlm/modules/asr_module.py

+        if self.target_module is not None:
+            model = get_nested_attr(asr_model, self.target_module)
+
+        model = HFWrappedEncoder(model)


nemo/collections/speechlm/utils/text_generation/audio_text_generation_utils.py

+    elif isinstance(inputs, tuple) and len(inputs) == 4:
+        context_tokens_tensor, context_length_tensor, audio_signal, audio_signal_length = inputs
+    elif isinstance(inputs, tuple) and len(inputs) == 6:  # multi-audio
+        has_multi_audios = True


The best way to fix the problem is to remove the unused variable has_multi_audios. This will clean up the code and eliminate the unnecessary assignment. Specifically, we need to remove the line where has_multi_audios is assigned a value and any related code that references it.

stevehuang52 and others added 30 commits September 16, 2024 10:38

fix type bugs

d967c51

Signed-off-by: stevehuang52 <[email protected]>

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

84eaa59

Merge remote-tracking branch 'origin/main' into slm_v2

22217d0

Merge remote-tracking branch 'origin/main' into slm_v2

799a5ec

Update mixin.py

568f073

add type hint Signed-off-by: He Huang (Steve) <[email protected]>

Apply isort and black reformatting

f91030d

Signed-off-by: stevehuang52 <[email protected]>

Update mixin.py

c9256ca

Signed-off-by: He Huang (Steve) <[email protected]>

Apply isort and black reformatting

e41e4f9

Signed-off-by: stevehuang52 <[email protected]>

Update mixin.py

2a70bd3

Signed-off-by: He Huang (Steve) <[email protected]>

Merge branch 'main' into slm_v2

725fd88

add datamodule

de70200

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into slm_v2

98a2673

add speechlm peft train, continue train, validation and misc

a00e5dd

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

52a1b0e

resolve merge confict

9d57bc6

Signed-off-by: stevehuang52 <[email protected]>

update datamodule

ae3c5be

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

c2ef987

Signed-off-by: stevehuang52 <[email protected]>

add script

dc951e2

Signed-off-by: stevehuang52 <[email protected]>

fix speechlm inference

248f4e2

Signed-off-by: stevehuang52 <[email protected]>

update

c42e17e

Signed-off-by: stevehuang52 <[email protected]>

mergin origin/main

9ec5a96

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

90dcb6a

fix tp support

0a93523

Signed-off-by: stevehuang52 <[email protected]>

Apply isort and black reformatting

5c87470

Signed-off-by: stevehuang52 <[email protected]>

Apply isort and black reformatting

934b0db

Signed-off-by: artbataev <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

e7af9a7

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

0c32fa5

update

9d98e89

Signed-off-by: stevehuang52 <[email protected]>

refactor

b659912

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

3653320

stevehuang52 added 25 commits February 11, 2025 14:46

added multimodal conversation

095c24e

Signed-off-by: stevehuang52 <[email protected]>

update

66d3ece

Signed-off-by: stevehuang52 <[email protected]>

debug

3901303

Signed-off-by: stevehuang52 <[email protected]>

debug

d9b29c4

Signed-off-by: stevehuang52 <[email protected]>

update cfg

07755a4

Signed-off-by: stevehuang52 <[email protected]>

clean up

3b3ae9e

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_dev

1dcbd6c

Signed-off-by: stevehuang52 <[email protected]>

add cp and fix tp

1fc1ffd

Signed-off-by: stevehuang52 <[email protected]>

add missing __init__

b93bcf0

Signed-off-by: stevehuang52 <[email protected]>

fix

144e39f

Signed-off-by: stevehuang52 <[email protected]>

fix import ckpt

c0040d3

Signed-off-by: stevehuang52 <[email protected]>

update io

c6791b0

Signed-off-by: stevehuang52 <[email protected]>

fix hf tokenizer remove_special_tokens

74a04be

Signed-off-by: stevehuang52 <[email protected]>

refactor

957e729

Signed-off-by: stevehuang52 <[email protected]>

comment out lhotse assert

dd725e7

Signed-off-by: stevehuang52 <[email protected]>

update cfg

04c6e77

Signed-off-by: stevehuang52 <[email protected]>

update cfg

902b0f6

Signed-off-by: stevehuang52 <[email protected]>

refactor and update inference

d05cbd4

Signed-off-by: stevehuang52 <[email protected]>

update infer

b2a853f

Signed-off-by: stevehuang52 <[email protected]>

update

c0da97f

Signed-off-by: stevehuang52 <[email protected]>

update cfg

f2f288a

Signed-off-by: stevehuang52 <[email protected]>

fix peft trainable params and update

fd6de0d

Signed-off-by: stevehuang52 <[email protected]>

add support for whisper encoder

26bb5a4

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_dev

601364e

Signed-off-by: stevehuang52 <[email protected]>

update

b88f0c0

Signed-off-by: stevehuang52 <[email protected]>

github-actions bot added common Multi Modal labels Feb 28, 2025

stevehuang52 added the skip-linting label Feb 28, 2025

stevehuang52 self-assigned this Feb 28, 2025

github-advanced-security bot found potential problems Feb 28, 2025

View reviewed changes

@@ -195,3 +195,3 @@
                 context_start_idx = None
-                if has_multi_audios:
+                if num_audios is not None:
                     num_audios = torch.empty(batch_size, dtype=torch.int64, device=torch.cuda.current_device())
@@ -367,3 +367,2 @@
                 tokenizer = model.tokenizer
-                has_multi_audios = False
                 num_audios = None
@@ -377,3 +376,2 @@
                 elif isinstance(inputs, tuple) and len(inputs) == 6:  # multi-audio
-                    has_multi_audios = True
                     (

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpeechLM Update #12430

SpeechLM Update #12430

stevehuang52 commented Feb 28, 2025 •

edited

Loading

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

SpeechLM Update #12430

Are you sure you want to change the base?

SpeechLM Update #12430

Conversation

stevehuang52 commented Feb 28, 2025 • edited Loading

What does this PR do ?

Changelog

stevehuang52 commented Feb 28, 2025 •

edited

Loading