gchhablani
/

wav2vec2-large-xlsr-mr

Automatic Speech Recognition

xlsr-fine-tuning-week

Inference Endpoints

Model card Files Files and versions Community

gchhablani commited on Mar 23, 2021

Commit

c30dc58

•

1 Parent(s): edf773e

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ tags:
 - xlsr-fine-tuning-week
 license: apache-2.0
 model-index:
-- name: GChhablani XLSR Wav2Vec2 Large 53 Marathi  #TODO: replace {human_readable_name} with a name of your model as it should appear on the leaderboard. It could be something like `Elgeish XLSR Wav2Vec2 Large 53`
   results:
   - task:
       name: Speech Recognition
@@ -25,7 +25,7 @@ model-index:
          value: 14.53
 ---
-# Wav2Vec2-Large-XLSR-53-Marathi #TODO: replace language with your {language}, *e.g.* French
 Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Marthi using the [OpenSLR SLR64](http://openslr.org/64/) dataset.
 When using this model, make sure that your speech input is sampled at 16kHz.
@@ -48,7 +48,7 @@ model = Wav2Vec2ForCTC.from_pretrained("gchhablani/wav2vec2-large-xlsr-mr")
 resampler = torchaudio.transforms.Resample(48_000, 16_000) # The original data was with 48,000 sampling rate. You can change it according to your input.
 # Preprocessing the datasets.
-# We need to read the aduio files as arrays
 def speech_file_to_array_fn(batch):
 	speech_array, sampling_rate = torchaudio.load(batch["path"])
 	batch["speech"] = resampler(speech_array).squeeze().numpy()
@@ -120,6 +120,5 @@ print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"],
 ## Training
-90% of the OpenSLR Marathi dataset was used for training. # TODO: adapt to state all the datasets that were used for training.
 The colab notebook used for training can be found [here](https://colab.research.google.com/drive/1_BbLyLqDUsXG3RpSULfLRjC6UY3RjwME?usp=sharing)

 - xlsr-fine-tuning-week
 license: apache-2.0
 model-index:
+- name: GChhablani XLSR Wav2Vec2 Large 53 Marathi
   results:
   - task:
       name: Speech Recognition
          value: 14.53
 ---
+# Wav2Vec2-Large-XLSR-53-Marathi
 Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Marthi using the [OpenSLR SLR64](http://openslr.org/64/) dataset.
 When using this model, make sure that your speech input is sampled at 16kHz.
 resampler = torchaudio.transforms.Resample(48_000, 16_000) # The original data was with 48,000 sampling rate. You can change it according to your input.
 # Preprocessing the datasets.
+# We need to read the audio files as arrays
 def speech_file_to_array_fn(batch):
 	speech_array, sampling_rate = torchaudio.load(batch["path"])
 	batch["speech"] = resampler(speech_array).squeeze().numpy()
 ## Training
+90% of the OpenSLR Marathi dataset was used for training.
 The colab notebook used for training can be found [here](https://colab.research.google.com/drive/1_BbLyLqDUsXG3RpSULfLRjC6UY3RjwME?usp=sharing)