lucio commited on
Commit
1906fc0
1 Parent(s): 3dd85cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -68,8 +68,11 @@ print("Prediction:", processor.batch_decode(predicted_ids))
68
  print("Reference:", test_dataset["sentence"][:2])
69
  ```
70
 
 
 
71
  Prediction: ['yaherukaga gukora igitaramo y iki mu jyiwa na mul mumbiliki', 'ini rero ntibizashoboka ka nibo nkunrabibzi']
72
  Reference: ['Yaherukaga gukora igitaramo nk’iki mu Mujyi wa Namur mu Bubiligi.', 'Ibi rero, ntibizashoboka, kandi nawe arabizi.']
 
73
 
74
  ## Evaluation
75
 
@@ -154,6 +157,6 @@ print("WER: {:2f}".format(100 * chunked_wer(result["sentence"], result["pred_str
154
 
155
  ## Training
156
 
157
- Blocks of examples from the Common Voice training dataset (totaling about 100k examples, 20% of the available data) were used for training for 30k global steps, on 1 V100 GPU provided by OVHcloud. For validation, 2048 examples of the validation dataset were used.
158
 
159
  The [script used for training](https://github.com/serapio/transformers/blob/feature/xlsr-finetune/examples/research_projects/wav2vec2/run_common_voice.py) is adapted from the [example script provided in the transformers repo](https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/run_common_voice.py).
 
68
  print("Reference:", test_dataset["sentence"][:2])
69
  ```
70
 
71
+ Result:
72
+ ```
73
  Prediction: ['yaherukaga gukora igitaramo y iki mu jyiwa na mul mumbiliki', 'ini rero ntibizashoboka ka nibo nkunrabibzi']
74
  Reference: ['Yaherukaga gukora igitaramo nk’iki mu Mujyi wa Namur mu Bubiligi.', 'Ibi rero, ntibizashoboka, kandi nawe arabizi.']
75
+ ```
76
 
77
  ## Evaluation
78
 
 
157
 
158
  ## Training
159
 
160
+ Blocks of examples from the Common Voice training dataset were used for training, after filtering out utterances that had any `down_vote` or were longer than 9.5 seconds. The data used totals about 100k examples, 20% of the available data. Training proceeded for 30k global steps, on 1 V100 GPU provided by OVHcloud. For validation, 2048 examples of the validation dataset were used.
161
 
162
  The [script used for training](https://github.com/serapio/transformers/blob/feature/xlsr-finetune/examples/research_projects/wav2vec2/run_common_voice.py) is adapted from the [example script provided in the transformers repo](https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/run_common_voice.py).