Question-Answering Task #99

jrohsc · 2024-11-27T02:57:12Z

Hi,

How can I use Qwen2-Audio model on a speech question-answering task? I tried both providing and not providing the instruction but it seems like the model is only transcribing the audio input.

hsoftxl · 2024-12-10T08:13:36Z

what is you code?

jrohsc · 2025-02-02T18:50:13Z

import torch
from transformers import AutoProcessor, Qwen2AudioForConditionalGeneration
import librosa

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

model = Qwen2AudioForConditionalGeneration.from_pretrained("Qwen/Qwen2-Audio-7B", trust_remote_code=True)
model = model.to(device)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-Audio-7B", trust_remote_code=True)

audio, sr = librosa.load(path, sr=processor.feature_extractor.sampling_rate)
inputs = processor(text=prompt, audios=audio, return_tensors="pt").to(device)

with torch.no_grad():
generated_ids = model.generate(**inputs, max_length=256)
generated_ids = generated_ids[:, inputs.input_ids.size(1):]
response = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

print(response)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question-Answering Task #99

Question-Answering Task #99

jrohsc commented Nov 27, 2024

hsoftxl commented Dec 10, 2024

jrohsc commented Feb 2, 2025 •

edited

Loading

Question-Answering Task #99

Question-Answering Task #99

Comments

jrohsc commented Nov 27, 2024

hsoftxl commented Dec 10, 2024

jrohsc commented Feb 2, 2025 • edited Loading

jrohsc commented Feb 2, 2025 •

edited

Loading