gchhablani
commited on
Commit
•
c30dc58
1
Parent(s):
edf773e
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ tags:
|
|
11 |
- xlsr-fine-tuning-week
|
12 |
license: apache-2.0
|
13 |
model-index:
|
14 |
-
- name: GChhablani XLSR Wav2Vec2 Large 53 Marathi
|
15 |
results:
|
16 |
- task:
|
17 |
name: Speech Recognition
|
@@ -25,7 +25,7 @@ model-index:
|
|
25 |
value: 14.53
|
26 |
---
|
27 |
|
28 |
-
# Wav2Vec2-Large-XLSR-53-Marathi
|
29 |
|
30 |
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Marthi using the [OpenSLR SLR64](http://openslr.org/64/) dataset.
|
31 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
@@ -48,7 +48,7 @@ model = Wav2Vec2ForCTC.from_pretrained("gchhablani/wav2vec2-large-xlsr-mr")
|
|
48 |
resampler = torchaudio.transforms.Resample(48_000, 16_000) # The original data was with 48,000 sampling rate. You can change it according to your input.
|
49 |
|
50 |
# Preprocessing the datasets.
|
51 |
-
# We need to read the
|
52 |
def speech_file_to_array_fn(batch):
|
53 |
speech_array, sampling_rate = torchaudio.load(batch["path"])
|
54 |
batch["speech"] = resampler(speech_array).squeeze().numpy()
|
@@ -120,6 +120,5 @@ print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"],
|
|
120 |
|
121 |
## Training
|
122 |
|
123 |
-
90% of the OpenSLR Marathi dataset was used for training.
|
124 |
-
|
125 |
The colab notebook used for training can be found [here](https://colab.research.google.com/drive/1_BbLyLqDUsXG3RpSULfLRjC6UY3RjwME?usp=sharing)
|
11 |
- xlsr-fine-tuning-week
|
12 |
license: apache-2.0
|
13 |
model-index:
|
14 |
+
- name: GChhablani XLSR Wav2Vec2 Large 53 Marathi
|
15 |
results:
|
16 |
- task:
|
17 |
name: Speech Recognition
|
25 |
value: 14.53
|
26 |
---
|
27 |
|
28 |
+
# Wav2Vec2-Large-XLSR-53-Marathi
|
29 |
|
30 |
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Marthi using the [OpenSLR SLR64](http://openslr.org/64/) dataset.
|
31 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
48 |
resampler = torchaudio.transforms.Resample(48_000, 16_000) # The original data was with 48,000 sampling rate. You can change it according to your input.
|
49 |
|
50 |
# Preprocessing the datasets.
|
51 |
+
# We need to read the audio files as arrays
|
52 |
def speech_file_to_array_fn(batch):
|
53 |
speech_array, sampling_rate = torchaudio.load(batch["path"])
|
54 |
batch["speech"] = resampler(speech_array).squeeze().numpy()
|
120 |
|
121 |
## Training
|
122 |
|
123 |
+
90% of the OpenSLR Marathi dataset was used for training.
|
|
|
124 |
The colab notebook used for training can be found [here](https://colab.research.google.com/drive/1_BbLyLqDUsXG3RpSULfLRjC6UY3RjwME?usp=sharing)
|