add files

Files changed (5) hide show

README.md ADDED Viewed

+---
+language: et
+tags:
+- audio
+- automatic-speech-recognition
+- voxpopuli
+license: cc-by-nc-4.0
+---
+# Wav2Vec2-Base-VoxPopuli-Finetuned
+[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) large model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in et (refer to Table 1 of paper for more information).
+**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
+Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390)*
+**Authors**: *Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux* from *Facebook AI*
+See the official website for more information, [here](https://github.com/facebookresearch/voxpopuli/)

preprocessor_config.json ADDED Viewed

+{
+  "do_normalize": true,
+  "feature_extractor_type": "Wav2Vec2FeatureExtractor",
+  "feature_size": 1,
+  "padding_side": "right",
+  "padding_value": 0,
+  "return_attention_mask": false,
+  "sampling_rate": 16000
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>"}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "pad_token": "<pad>", "do_lower_case": false, "word_delimiter_token": "\|"}

vocab.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"<s>": 0, "<pad>": 1, "</s>": 2, "<unk>": 3, "\|": 4, "e": 5, "a": 6, "o": 7, "s": 8, "n": 9, "r": 10, "i": 11, "l": 12, "d": 13, "c": 14, "t": 15, "u": 16, "p": 17, "m": 18, "b": 19, "q": 20, "y": 21, "g": 22, "v": 23, "h": 24, "ó": 25, "f": 26, "í": 27, "á": 28, "j": 29, "z": 30, "ñ": 31, "é": 32, "x": 33, "ú": 34, "k": 35, "w": 36, "ü": 37, "1": 38}