gigant commited on
Commit
ea5cba6
1 Parent(s): 69de36e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -56,18 +56,18 @@ model-index:
56
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
57
  should probably proofread and complete it, then remove this comment. -->
58
 
59
- # wav2vec2-ro-300m_01
60
 
61
- This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) dataset, with extra training data from [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) dataset.
62
 
63
- It achieves the following results on the evaluation set:
64
  - Loss: 0.1553
65
  - Wer: 0.1174
66
  - Cer: 0.0294
67
 
68
  ## Model description
69
 
70
- More information needed
71
 
72
  ## Intended uses & limitations
73
 
@@ -75,7 +75,12 @@ More information needed
75
 
76
  ## Training and evaluation data
77
 
78
- More information needed
 
 
 
 
 
79
 
80
  ## Training procedure
81
 
 
56
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
57
  should probably proofread and complete it, then remove this comment. -->
58
 
59
+ # Romanian Wav2Vec2
60
 
61
+ This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) dataset (train + validation + other splits), with extra training data from [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) dataset (train + test splits).
62
 
63
+ Without the 5-gram Language Model optimization, it achieves the following results on the evaluation set (Common Voice 8.0, Romanian subset, test split):
64
  - Loss: 0.1553
65
  - Wer: 0.1174
66
  - Cer: 0.0294
67
 
68
  ## Model description
69
 
70
+ The architecture is based on [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) with a speech recognition CTC head and an added 5-gram language model (using [pyctcdecode](https://github.com/kensho-technologies/pyctcdecode) and [kenlm](https://github.com/kpu/kenlm)). Those libraries are needed in order for the language model-boosted decoder to work.
71
 
72
  ## Intended uses & limitations
73
 
 
75
 
76
  ## Training and evaluation data
77
 
78
+ Training data :
79
+ - [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) : train + validation + other splits
80
+ - [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) : train + test splits
81
+
82
+ Evaluation data :
83
+ - [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) : test split
84
 
85
  ## Training procedure
86