Super cool model!

by patrickvonplaten - opened Jun 28, 2022

Discussion

patrickvonplaten

Jun 28, 2022

Also cc @stefan-it and @flozi00

flozi00

Jun 28, 2022

Thanks for pinging me
Definitely looks interesting, will try this for my multi lingual approach

fxtentacle

Owner Jun 28, 2022

•

edited Jun 28, 2022

@flozi00 In https://huggingface.co/fxtentacle/tevr-token-entropy-predictor-de is the Jupyter notebook for extracting tokens. Bilingual DE+EN worked OK for me too, but I conditioned the acoustic decoder by using different BOS tokens for each language, similar to when training mBART for translation.

BTW, we cited your https://huggingface.co/flozi00/wav2vec2-xls-r-1b-5gram-german as a reference in the paper. You're the "wav2vec 2.0 XLS-R 1B 5-gram 4:38% Zimmermeister (2022)*" in the results table.

fxtentacle

Owner Aug 10, 2022

FYI the C++ code for the Linux CLI tool based on this model is now on GitHub: https://github.com/DeutscheKI/tevr-asr-tool

fxtentacle

Owner Aug 10, 2022

And BTW @flozi00 I got this comment: https://news.ycombinator.com/item?id=32413636 from nshmyrev who is one of the main developers of the Vosk speech recognition toolkit. He believes that both of our models are overtrained for CommonVoice German and suggests a new method I never heard about called "perplexity" to check for that and prevent it to make the models work better for general everyday recognition.

flozi00

Aug 11, 2022

Hmm
That seems interesting, because the discussion is about the language model and I did not train it on commonvoice.
The LM is trained on parts of Wikipedia and Oscar instead

entn-at

Aug 11, 2022

I think what nshmyrev meant is that perhaps the LM training data contained sentences that are also in the CV test set (e.g., if the sentences used in CV test were sourced from Project Guttenberg texts, and these texts were also included in OSCAR), which would bias it. I have no idea if that's true or not, but it would explain the large improvement after adding the LM.

flozi00

Aug 11, 2022

I disagree, as the LM only lowers the WER and the CER barely changes.
The LM only fixes the typos but not the understanding in transcription.

So I dont think the LM is overfitted and biased on CV

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment