Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,9 @@
|
|
|
|
|
|
1 |
language: ne
|
2 |
datasets:
|
3 |
- OpenSLR
|
4 |
-
-
|
5 |
metrics:
|
6 |
- wer
|
7 |
tags:
|
@@ -17,18 +19,19 @@ model-index:
|
|
17 |
name: Speech Recognition
|
18 |
type: automatic-speech-recognition
|
19 |
dataset:
|
20 |
-
name:
|
21 |
-
type:
|
22 |
-
args:
|
23 |
metrics:
|
24 |
- name: Test WER
|
25 |
type: wer
|
26 |
-
value:
|
27 |
---
|
28 |
|
29 |
-
# Wav2Vec2-Large-XLSR-53-
|
|
|
|
|
30 |
|
31 |
-
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on {language} using the [Common Voice](https://huggingface.co/datasets/common_voice), ... and ... dataset{s}. #TODO: replace {language} with your language, *e.g.* French and eventually add more datasets that were used and eventually remove common voice if model was not trained on common voice
|
32 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
33 |
|
34 |
## Usage
|
@@ -41,10 +44,10 @@ import torchaudio
|
|
41 |
from datasets import load_dataset
|
42 |
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
43 |
|
44 |
-
test_dataset = load_dataset(
|
45 |
|
46 |
-
processor = Wav2Vec2Processor.from_pretrained("
|
47 |
-
model = Wav2Vec2ForCTC.from_pretrained("
|
48 |
|
49 |
resampler = torchaudio.transforms.Resample(48_000, 16_000)
|
50 |
|
@@ -65,8 +68,13 @@ predicted_ids = torch.argmax(logits, dim=-1)
|
|
65 |
|
66 |
print("Prediction:", processor.batch_decode(predicted_ids))
|
67 |
print("Reference:", test_dataset["sentence"][:2])
|
|
|
68 |
```
|
|
|
69 |
|
|
|
|
|
|
|
70 |
|
71 |
## Evaluation
|
72 |
|
@@ -80,11 +88,11 @@ from datasets import load_dataset, load_metric
|
|
80 |
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
81 |
import re
|
82 |
|
83 |
-
test_dataset = load_dataset(
|
84 |
wer = load_metric("wer")
|
85 |
|
86 |
-
processor = Wav2Vec2Processor.from_pretrained("
|
87 |
-
model = Wav2Vec2ForCTC.from_pretrained("
|
88 |
model.to("cuda")
|
89 |
|
90 |
chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]' # TODO: adapt this list to include all special characters you removed from the data
|
@@ -115,13 +123,11 @@ def evaluate(batch):
|
|
115 |
result = test_dataset.map(evaluate, batched=True, batch_size=8)
|
116 |
|
117 |
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
|
118 |
-
```
|
119 |
|
120 |
-
|
121 |
|
|
|
122 |
|
123 |
## Training
|
124 |
|
125 |
-
The
|
126 |
-
|
127 |
-
The script used for training can be found [here](...) # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.
|
|
|
1 |
+
---
|
2 |
+
|
3 |
language: ne
|
4 |
datasets:
|
5 |
- OpenSLR
|
6 |
+
- common_voice
|
7 |
metrics:
|
8 |
- wer
|
9 |
tags:
|
|
|
19 |
name: Speech Recognition
|
20 |
type: automatic-speech-recognition
|
21 |
dataset:
|
22 |
+
name: OpenSLR ne
|
23 |
+
type: OpenSLR
|
24 |
+
args: ne
|
25 |
metrics:
|
26 |
- name: Test WER
|
27 |
type: wer
|
28 |
+
value: 05.970952
|
29 |
---
|
30 |
|
31 |
+
# Wav2Vec2-Large-XLSR-53-Nepali
|
32 |
+
|
33 |
+
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Nepali using the [Common Voice](https://huggingface.co/datasets/common_voice), ... and ... dataset OpenSLR ne.
|
34 |
|
|
|
35 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
36 |
|
37 |
## Usage
|
|
|
44 |
from datasets import load_dataset
|
45 |
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
46 |
|
47 |
+
test_dataset = load_dataset('csv', data_files='/content/ne_np_female/line_index_test.csv',split = 'test')
|
48 |
|
49 |
+
processor = Wav2Vec2Processor.from_pretrained("gagan3012/wav2vec2-xlsr-nepali")
|
50 |
+
model = Wav2Vec2ForCTC.from_pretrained("gagan3012/wav2vec2-xlsr-nepali")
|
51 |
|
52 |
resampler = torchaudio.transforms.Resample(48_000, 16_000)
|
53 |
|
|
|
68 |
|
69 |
print("Prediction:", processor.batch_decode(predicted_ids))
|
70 |
print("Reference:", test_dataset["sentence"][:2])
|
71 |
+
|
72 |
```
|
73 |
+
#### Result
|
74 |
|
75 |
+
Prediction: ['पारानाको ब्राजिली राज्यमा रहेको राजधानी', 'देवराज जोशी त्रिभुवन विश्वविद्यालयबाट शिक्षाशास्त्रमा स्नातक हुनुहुन्छ']
|
76 |
+
|
77 |
+
Reference: ['पारानाको ब्राजिली राज्यमा रहेको राजधानी', 'देवराज जोशी त्रिभुवन विश्वविद्यालयबाट शिक्षाशास्त्रमा स्नातक हुनुहुन्छ']
|
78 |
|
79 |
## Evaluation
|
80 |
|
|
|
88 |
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
89 |
import re
|
90 |
|
91 |
+
test_dataset = load_dataset('csv', data_files='/content/ne_np_female/line_index_test.csv',split = 'test')
|
92 |
wer = load_metric("wer")
|
93 |
|
94 |
+
processor = Wav2Vec2Processor.from_pretrained("gagan3012/wav2vec2-xlsr-nepali")
|
95 |
+
model = Wav2Vec2ForCTC.from_pretrained("gagan3012/wav2vec2-xlsr-nepali")
|
96 |
model.to("cuda")
|
97 |
|
98 |
chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]' # TODO: adapt this list to include all special characters you removed from the data
|
|
|
123 |
result = test_dataset.map(evaluate, batched=True, batch_size=8)
|
124 |
|
125 |
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
|
|
|
126 |
|
127 |
+
```
|
128 |
|
129 |
+
**Test Result**: 5.970952 %
|
130 |
|
131 |
## Training
|
132 |
|
133 |
+
The script used for training can be found [here](https://colab.research.google.com/drive/1AHnYWXb5cwfMEa2o4O3TSdasAR3iVBFP?usp=sharing)
|
|
|
|