Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue when using HuggingFace portuguese language model #62

Open
lfcnassif opened this issue Sep 2, 2022 · 0 comments
Open

Comments

@lfcnassif
Copy link

First, thank you very much for this great project, it makes ASR very easy!

And your models are awesome! I made some accuracy tests with https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-portuguese model (sepinf-inc/IPED#1214 (comment)) and it is comparable to Microsoft's and Google's pt-BR models, actually a bit better!

Now I'm trying to use a language model as described in the Readme.md. I'm trying to use the same LM in the language_model folder in the HuggingFace model card above, but it prints some warning in console:

09/02/2022 12:10:19 - WARNING - pyctcdecode.alphabet - Found entries of length > 1 in alphabet. This is unusual unless style is BPE, but the alphabet was not recognized as BPE type. Is this correct?
09/02/2022 12:10:19 - WARNING - pyctcdecode.alphabet - Unigrams and labels don't seem to agree.

WER accuracy also dropped a lot. Am I doing something wrong? What language model is compatible to the above Portuguese model?

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant