Facebook's Wav2Vec2 large model pretrained on the 100k unlabeled subset of VoxPopuli corpus.
Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model.
Paper: VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Authors: Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI
See the official website for more information, here
Please refer to this blog on how to fine-tune this model on a specific language. Note that you should replace
"facebook/wav2vec2-large-xlsr-53" with this checkpoint for fine-tuning.
- Downloads last month