bourdoiscatie's picture
add language tags
c12c9ee
metadata
language:
  - multilingual
  - bg
  - cs
  - hr
  - da
  - nl
  - en
  - et
  - fi
  - fr
  - de
  - el
  - hu
  - it
  - lv
  - lt
  - mt
  - pl
  - pt
  - ro
  - sk
  - sl
  - es
  - sv
tags:
  - audio
  - automatic-speech-recognition
  - voxpopuli
license: cc-by-nc-4.0

Wav2Vec2-Large-VoxPopuli

Facebook's Wav2Vec2 large model pretrained on the 100k unlabeled subset of VoxPopuli corpus.

Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model.

Paper: VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

Authors: Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI

See the official website for more information, here

Fine-Tuning

Please refer to this blog on how to fine-tune this model on a specific language. Note that you should replace "facebook/wav2vec2-large-xlsr-53" with this checkpoint for fine-tuning.