PereLluis13
/

wav2vec2-large-xlsr-53-greek

Automatic Speech Recognition Transformers PyTorch JAX

Greek wav2vec2 audio speech xlsr-fine-tuning-week Eval Results Inference Endpoints

Model card Files Files and versions Community

PereLluis13 commited on Mar 24, 2021

Commit

63c7357

•

1 Parent(s): 56de3dd

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -87,7 +87,7 @@ processor = Wav2Vec2Processor.from_pretrained("PereLluis13/wav2vec2-large-xlsr-5
 model = Wav2Vec2ForCTC.from_pretrained("PereLluis13/wav2vec2-large-xlsr-53-greek")
 model.to("cuda")
-chars_to_ignore_regex = '[\\\\\\\\,\\\\\\\\?\\\\\\\\.\\\\\\\\!\\\\\\\\-\\\\\\\\;\\\\\\\\:\\\\\\\\"\\\\\\\\“\\\\\\\\%\\\\\\\\‘\\\\\\\\”\\\\\\\\�]'
 resampler = torchaudio.transforms.Resample(48_000, 16_000)
@@ -137,6 +137,6 @@ The Common Voice `train`, `validation`, and CSS10 datasets were used for trainin
         return batch
 ```
-As suggested by Florian Zimmermeister.
 The script used for training can be found in [run_common_voice.py](examples/research_projects/wav2vec2/run_common_voice.py), still pending of PR. The only changes are to `speech_file_to_array_fn`. Batch size was kept at 32 (using `gradient_accumulation_steps`) using one of the [OVH](https://www.ovh.com/) machines, with a V100 GPU (thank you very much [OVH](https://www.ovh.com/)). The model trained for 40 epochs, the first 20 with the `train+validation` splits, and then `extra` split was added with the data from CSS10 at the 20th epoch.

 model = Wav2Vec2ForCTC.from_pretrained("PereLluis13/wav2vec2-large-xlsr-53-greek")
 model.to("cuda")
+chars_to_ignore_regex = '[\\\\\\\\\\\\\\\\,\\\\\\\\\\\\\\\\?\\\\\\\\\\\\\\\\.\\\\\\\\\\\\\\\\!\\\\\\\\\\\\\\\\-\\\\\\\\\\\\\\\\;\\\\\\\\\\\\\\\\:\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\“\\\\\\\\\\\\\\\\%\\\\\\\\\\\\\\\\‘\\\\\\\\\\\\\\\\”\\\\\\\\\\\\\\\\�]'
 resampler = torchaudio.transforms.Resample(48_000, 16_000)
         return batch
 ```
+As suggested by [Florian Zimmermeister](https://github.com/flozi00).
 The script used for training can be found in [run_common_voice.py](examples/research_projects/wav2vec2/run_common_voice.py), still pending of PR. The only changes are to `speech_file_to_array_fn`. Batch size was kept at 32 (using `gradient_accumulation_steps`) using one of the [OVH](https://www.ovh.com/) machines, with a V100 GPU (thank you very much [OVH](https://www.ovh.com/)). The model trained for 40 epochs, the first 20 with the `train+validation` splits, and then `extra` split was added with the data from CSS10 at the 20th epoch.