Transcriptions have no spaces - wav2vec2-xls-r-1b-spanish #51

santideleon · 2022-08-03T14:27:44Z

I am working on Speech to Text for ~135 (or less) second audios of spanish recorded by lapel microphons or VR goggles. I am using wav2vec2-xls-r-1b-spanish and the language model lm.binary and unigrams.txt provided. They are the ones downloaded from jonatasgrosman/wav2vec2-large-xlsr-53-spanish, but based on the size they seems to be the exact same for 1b. I originally started with large version, but I opted for 1b for better performance.

My plan is to work on the text with the pysentimiento pre-trained spanish sentiment and emotion analyzer. The problem I have is that the text does not have spaces separating the words.

Is there a quick fix for this or any suggestions?

Example:
alesundíamanormalparamímelevantosobrelasochodelamañana desayunasepredesayunoalomismodeayunosquirconceriales yfrutameduchomeevistoacosasenchilavoycaminandosube lacuestahastaelaparadadelautobustyietesperoquevenga autobusesestallevaalaparadadesanlorenzocojoelmetro

code:


model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-xls-r-1b-spanish")
lm_path = "language_model/lm.binary"
unigrams_path = "language_model/unigrams.txt"
decoder = KenshoLMDecoder(model.token_set, lm_path=lm_path, unigrams_path=unigrams_path)

def process_single_audio(correct_path, sr=16000,):
   

    #y, sr = librosa.load(str(path+correct_path),sr=sr)
    transcriptions = model.transcribe([str(correct_path)[1:]], decoder=decoder)

    print(transcriptions[0]['transcription'])


    return transcriptions[0]['transcription']

The text was updated successfully, but these errors were encountered:

santideleon · 2022-08-03T18:06:29Z

This problem seems to be fixed by using the automatic-speech-recognition pipeline. With and without chunking. Not really sure what is happening.

code:
`
pipe = pipeline("automatic-speech-recognition", model="jonatasgrosman/wav2vec2-xls-r-1b-spanish",
tokenizer="jonatasgrosman/wav2vec2-xls-r-1b-spanish",
feature_extractor= "jonatasgrosman/wav2vec2-xls-r-1b-spanish",
decoder=decoder)

transcriptions = pipe(str(correct_path)[1:])
`

Additionally I tested chunking in the pipeline. My first thought was that there was a problem with the length of the audios, but after testing different chunking parameters and then without chunking, it worked perfectly. The only thing I would note is that chunking significantly increases the time of processing the audio. I saw processing times of twice as long and up to seven times more. In terms of accurately transcribing the audios the longest to compute (of 10s chunks) seemed to work the best, but it is not worth the computation time, since 30s chunks which only doubled the processing time was almost as good.

iljab · 2022-08-16T13:32:01Z

Same issue using the jonatasgrosman/wav2vec2-large-xlsr-53-german model

arikhalperin · 2022-08-23T11:35:31Z

You should try to add a language model. See here:
https://huggingface.co/blog/wav2vec2-with-ngram

detongz · 2023-07-27T19:14:04Z

@santideleon Hi and I have this same issue using wbbbbb/wav2vec2-large-chinese-zh-cn model.

Have you solved this problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcriptions have no spaces - wav2vec2-xls-r-1b-spanish #51

Transcriptions have no spaces - wav2vec2-xls-r-1b-spanish #51

santideleon commented Aug 3, 2022 •

edited

Loading

santideleon commented Aug 3, 2022

iljab commented Aug 16, 2022

arikhalperin commented Aug 23, 2022

detongz commented Jul 27, 2023

Transcriptions have no spaces - wav2vec2-xls-r-1b-spanish #51

Transcriptions have no spaces - wav2vec2-xls-r-1b-spanish #51

Comments

santideleon commented Aug 3, 2022 • edited Loading

santideleon commented Aug 3, 2022

iljab commented Aug 16, 2022

arikhalperin commented Aug 23, 2022

detongz commented Jul 27, 2023

santideleon commented Aug 3, 2022 •

edited

Loading