RASMUS/Whisper-youtube-crosslingual-subtitles

Dec 23, 2022

The large whisper model is performing substantially better than the other sizes. It’s a lot slower, sure, but it would nice to have the option 😊

RASMUS

Owner Dec 24, 2022

I will add it.
I will also slowly try to add some better finetuned models in each language from this list: https://huggingface.co/spaces/whisper-event/winners?dataset=mozilla-foundation%2Fcommon_voice_11_0
I will need to convert those models to @ggerganov implementation. Not sure whether my conversion works yet from hf models but this seems promising https://github.com/ggerganov/whisper.cpp/issues/325
There is for example Danish medium model with 13.71 WER on CV11 compared to 14.4 with Whisper-large-v2 on CV9 from the paper

RASMUS

Owner Dec 24, 2022

Large model added

saattrupdan

Dec 24, 2022

•

edited Dec 24, 2022

Great, thanks!

As a potential alternative to the Whisper models, there are more light-weight ones that perform (at least) as good. In Danish we for instance have this Wav2Vec2 model, which achieves a WER of 10.8% without a language model:
https://huggingface.co/chcaa/xls-r-300m-danish-nst-cv9

saattrupdan changed discussion status to closed Dec 24, 2022

Spaces:

RASMUS
/

Whisper-youtube-crosslingual-subtitles

Running

Allow large whisper model?