Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpeechLM Update #12430

Draft
wants to merge 161 commits into
base: main
Choose a base branch
from
Draft

SpeechLM Update #12430

wants to merge 161 commits into from

Conversation

stevehuang52
Copy link
Collaborator

@stevehuang52 stevehuang52 commented Feb 28, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

A couple of important updates in SpeechLM

Collection: [speechlm]

Changelog

  • Unified various input format into multimodal conversation, where audio and text are interleaved.
  • Added Whisper encoder support.
  • Fixed issues with PEFT not saving batch norm stats.
  • Fixed loading PEFT ckpt for inference.
  • Context parallel support in LLM. Speech encoder still cannot.
  • Various improvements and refactoring.

stevehuang52 and others added 30 commits September 16, 2024 10:38
Signed-off-by: stevehuang52 <[email protected]>
add type hint

Signed-off-by: He Huang (Steve) <[email protected]>
Signed-off-by: He Huang (Steve) <[email protected]>
Signed-off-by: He Huang (Steve) <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
input_ids = [sample.input_ids for sample in samples]
context_ids = [sample.context_ids for sample in samples]
context_lengths = [sample.context_length for sample in samples]
answer_ids = [sample.answer_ids for sample in samples]

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'answer_ids' is unnecessary as it is
redefined
before this value is used.
if self.target_module is not None:
model = get_nested_attr(asr_model, self.target_module)

model = HFWrappedEncoder(model)

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable Error

Local variable 'model' may be used before it is initialized.
elif isinstance(inputs, tuple) and len(inputs) == 4:
context_tokens_tensor, context_length_tensor, audio_signal, audio_signal_length = inputs
elif isinstance(inputs, tuple) and len(inputs) == 6: # multi-audio
has_multi_audios = True

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable has_multi_audios is not used.

Copilot Autofix AI 2 days ago

The best way to fix the problem is to remove the unused variable has_multi_audios. This will clean up the code and eliminate the unnecessary assignment. Specifically, we need to remove the line where has_multi_audios is assigned a value and any related code that references it.

Suggested changeset 1
nemo/collections/speechlm/utils/text_generation/audio_text_generation_utils.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/speechlm/utils/text_generation/audio_text_generation_utils.py b/nemo/collections/speechlm/utils/text_generation/audio_text_generation_utils.py
--- a/nemo/collections/speechlm/utils/text_generation/audio_text_generation_utils.py
+++ b/nemo/collections/speechlm/utils/text_generation/audio_text_generation_utils.py
@@ -195,3 +195,3 @@
     context_start_idx = None
-    if has_multi_audios:
+    if num_audios is not None:
         num_audios = torch.empty(batch_size, dtype=torch.int64, device=torch.cuda.current_device())
@@ -367,3 +367,2 @@
     tokenizer = model.tokenizer
-    has_multi_audios = False
     num_audios = None
@@ -377,3 +376,2 @@
     elif isinstance(inputs, tuple) and len(inputs) == 6:  # multi-audio
-        has_multi_audios = True
         (
EOF
@@ -195,3 +195,3 @@
context_start_idx = None
if has_multi_audios:
if num_audios is not None:
num_audios = torch.empty(batch_size, dtype=torch.int64, device=torch.cuda.current_device())
@@ -367,3 +367,2 @@
tokenizer = model.tokenizer
has_multi_audios = False
num_audios = None
@@ -377,3 +376,2 @@
elif isinstance(inputs, tuple) and len(inputs) == 6: # multi-audio
has_multi_audios = True
(
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants