Commit
·
76dc1b0
1
Parent(s):
daf13ec
add readme
Browse files
README.md
CHANGED
|
@@ -1,6 +1,26 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
tags:
|
| 3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
language: mt
|
| 3 |
tags:
|
| 4 |
+
- audio
|
| 5 |
+
- automatic-speech-recognition
|
| 6 |
+
- voxpopuli
|
| 7 |
+
datasets:
|
| 8 |
+
- voxpopuli
|
| 9 |
+
license: cc-by-nc-4.0
|
| 10 |
+
inference: false
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# Wav2Vec2-base-VoxPopuli-V2
|
| 14 |
+
|
| 15 |
+
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained only in **mt** on **9.1k** unlabeled datat of the [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).
|
| 16 |
+
|
| 17 |
+
The model is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
|
| 18 |
+
|
| 19 |
+
**Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data in **mt**. Check out [this blog](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for a more in-detail explanation of how to fine-tune the model.
|
| 20 |
+
|
| 21 |
+
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
|
| 22 |
+
Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390)*
|
| 23 |
+
|
| 24 |
+
**Authors**: *Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux* from *Facebook AI*.
|
| 25 |
+
|
| 26 |
+
See the official website for more information, [here](https://github.com/facebookresearch/voxpopuli/).
|