gagan3012 commited on
Commit
9c1bbd1
1 Parent(s): 515e217

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -18
README.md CHANGED
@@ -1,7 +1,9 @@
 
 
1
  language: ne
2
  datasets:
3
  - OpenSLR
4
- - Common
5
  metrics:
6
  - wer
7
  tags:
@@ -17,18 +19,19 @@ model-index:
17
  name: Speech Recognition
18
  type: automatic-speech-recognition
19
  dataset:
20
- name: Common Voice {lang_id} #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
21
- type: common_voice
22
- args: {lang_id} #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
23
  metrics:
24
  - name: Test WER
25
  type: wer
26
- value: {wer_result_on_test} #TODO (IMPORTANT): replace {wer_result_on_test} with the WER error rate you achieved on the common_voice test set. It should be in the format XX.XX (don't add the % sign here). **Please** remember to fill out this value after you evaluated your model, so that your model appears on the leaderboard. If you fill out this model card before evaluating your model, please remember to edit the model card afterward to fill in your value
27
  ---
28
 
29
- # Wav2Vec2-Large-XLSR-53-{language} #TODO: replace language with your {language}, *e.g.* French
 
 
30
 
31
- Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on {language} using the [Common Voice](https://huggingface.co/datasets/common_voice), ... and ... dataset{s}. #TODO: replace {language} with your language, *e.g.* French and eventually add more datasets that were used and eventually remove common voice if model was not trained on common voice
32
  When using this model, make sure that your speech input is sampled at 16kHz.
33
 
34
  ## Usage
@@ -41,10 +44,10 @@ import torchaudio
41
  from datasets import load_dataset
42
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
43
 
44
- test_dataset = load_dataset("common_voice", "{lang_id}", split="test[:2%]") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
45
 
46
- processor = Wav2Vec2Processor.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
47
- model = Wav2Vec2ForCTC.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
48
 
49
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
50
 
@@ -65,8 +68,13 @@ predicted_ids = torch.argmax(logits, dim=-1)
65
 
66
  print("Prediction:", processor.batch_decode(predicted_ids))
67
  print("Reference:", test_dataset["sentence"][:2])
 
68
  ```
 
69
 
 
 
 
70
 
71
  ## Evaluation
72
 
@@ -80,11 +88,11 @@ from datasets import load_dataset, load_metric
80
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
81
  import re
82
 
83
- test_dataset = load_dataset("common_voice", "{lang_id}", split="test") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
84
  wer = load_metric("wer")
85
 
86
- processor = Wav2Vec2Processor.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
87
- model = Wav2Vec2ForCTC.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
88
  model.to("cuda")
89
 
90
  chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]' # TODO: adapt this list to include all special characters you removed from the data
@@ -115,13 +123,11 @@ def evaluate(batch):
115
  result = test_dataset.map(evaluate, batched=True, batch_size=8)
116
 
117
  print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
118
- ```
119
 
120
- **Test Result**: XX.XX % # TODO: write output of print here. IMPORTANT: Please remember to also replace {wer_result_on_test} at the top of with this value here. tags.
121
 
 
122
 
123
  ## Training
124
 
125
- The Common Voice `train`, `validation`, and ... datasets were used for training as well as ... and ... # TODO: adapt to state all the datasets that were used for training.
126
-
127
- The script used for training can be found [here](...) # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.
 
1
+ ---
2
+
3
  language: ne
4
  datasets:
5
  - OpenSLR
6
+ - common_voice
7
  metrics:
8
  - wer
9
  tags:
 
19
  name: Speech Recognition
20
  type: automatic-speech-recognition
21
  dataset:
22
+ name: OpenSLR ne
23
+ type: OpenSLR
24
+ args: ne
25
  metrics:
26
  - name: Test WER
27
  type: wer
28
+ value: 05.970952
29
  ---
30
 
31
+ # Wav2Vec2-Large-XLSR-53-Nepali
32
+
33
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Nepali using the [Common Voice](https://huggingface.co/datasets/common_voice), ... and ... dataset OpenSLR ne.
34
 
 
35
  When using this model, make sure that your speech input is sampled at 16kHz.
36
 
37
  ## Usage
 
44
  from datasets import load_dataset
45
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
46
 
47
+ test_dataset = load_dataset('csv', data_files='/content/ne_np_female/line_index_test.csv',split = 'test')
48
 
49
+ processor = Wav2Vec2Processor.from_pretrained("gagan3012/wav2vec2-xlsr-nepali")
50
+ model = Wav2Vec2ForCTC.from_pretrained("gagan3012/wav2vec2-xlsr-nepali")
51
 
52
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
53
 
 
68
 
69
  print("Prediction:", processor.batch_decode(predicted_ids))
70
  print("Reference:", test_dataset["sentence"][:2])
71
+
72
  ```
73
+ #### Result
74
 
75
+ Prediction: ['पारानाको ब्राजिली राज्यमा रहेको राजधानी', 'देवराज जोशी त्रिभुवन विश्वविद्यालयबाट शिक्षाशास्त्रमा स्नातक हुनुहुन्छ']
76
+
77
+ Reference: ['पारानाको ब्राजिली राज्यमा रहेको राजधानी', 'देवराज जोशी त्रिभुवन विश्वविद्यालयबाट शिक्षाशास्त्रमा स्नातक हुनुहुन्छ']
78
 
79
  ## Evaluation
80
 
 
88
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
89
  import re
90
 
91
+ test_dataset = load_dataset('csv', data_files='/content/ne_np_female/line_index_test.csv',split = 'test')
92
  wer = load_metric("wer")
93
 
94
+ processor = Wav2Vec2Processor.from_pretrained("gagan3012/wav2vec2-xlsr-nepali")
95
+ model = Wav2Vec2ForCTC.from_pretrained("gagan3012/wav2vec2-xlsr-nepali")
96
  model.to("cuda")
97
 
98
  chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]' # TODO: adapt this list to include all special characters you removed from the data
 
123
  result = test_dataset.map(evaluate, batched=True, batch_size=8)
124
 
125
  print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
 
126
 
127
+ ```
128
 
129
+ **Test Result**: 5.970952 %
130
 
131
  ## Training
132
 
133
+ The script used for training can be found [here](https://colab.research.google.com/drive/1AHnYWXb5cwfMEa2o4O3TSdasAR3iVBFP?usp=sharing)