PereLluis13 commited on
Commit
56de3dd
1 Parent(s): ed38cca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -87,7 +87,7 @@ processor = Wav2Vec2Processor.from_pretrained("PereLluis13/wav2vec2-large-xlsr-5
87
  model = Wav2Vec2ForCTC.from_pretrained("PereLluis13/wav2vec2-large-xlsr-53-greek")
88
  model.to("cuda")
89
 
90
- chars_to_ignore_regex = '[\\\\,\\\\?\\\\.\\\\!\\\\-\\\\;\\\\:\\\\"\\\\“\\\\%\\\\‘\\\\”\\\\�]'
91
 
92
 
93
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
@@ -123,7 +123,7 @@ print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"],
123
 
124
  ## Training
125
 
126
- The Common Voice `train`, `validation`, and CSS10 datasets were used for training, added as `extra` split to the dataset. The sampling rate and format of the CSS10 files is different, hence the function `speech_file_to_array_fn` was changed to: # TODO: adapt to state all the datasets that were used for training.
127
  ```
128
  def speech_file_to_array_fn(batch):
129
  try:
@@ -139,4 +139,4 @@ The Common Voice `train`, `validation`, and CSS10 datasets were used for trainin
139
 
140
  As suggested by Florian Zimmermeister.
141
 
142
- The script used for training can be found in [run_common_voice.py](examples/research_projects/wav2vec2/run_common_voice.py), still pending of PR. The only changes are to `speech_file_to_array_fn`. Batch size was kept at 32 (using `gradient_accumulation_steps`) using one of the [OVH](https://www.ovh.com/) machines, with a V100 GPU (thank you very much [OVH](https://www.ovh.com/)). The model trained for 40 epochs, the first 20 with the `train+validation` splits, and then `extra` split was added with the data from CSS10 at the 20th epoch. # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.
87
  model = Wav2Vec2ForCTC.from_pretrained("PereLluis13/wav2vec2-large-xlsr-53-greek")
88
  model.to("cuda")
89
 
90
+ chars_to_ignore_regex = '[\\\\\\\\,\\\\\\\\?\\\\\\\\.\\\\\\\\!\\\\\\\\-\\\\\\\\;\\\\\\\\:\\\\\\\\"\\\\\\\\“\\\\\\\\%\\\\\\\\‘\\\\\\\\”\\\\\\\\�]'
91
 
92
 
93
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
123
 
124
  ## Training
125
 
126
+ The Common Voice `train`, `validation`, and CSS10 datasets were used for training, added as `extra` split to the dataset. The sampling rate and format of the CSS10 files is different, hence the function `speech_file_to_array_fn` was changed to:
127
  ```
128
  def speech_file_to_array_fn(batch):
129
  try:
139
 
140
  As suggested by Florian Zimmermeister.
141
 
142
+ The script used for training can be found in [run_common_voice.py](examples/research_projects/wav2vec2/run_common_voice.py), still pending of PR. The only changes are to `speech_file_to_array_fn`. Batch size was kept at 32 (using `gradient_accumulation_steps`) using one of the [OVH](https://www.ovh.com/) machines, with a V100 GPU (thank you very much [OVH](https://www.ovh.com/)). The model trained for 40 epochs, the first 20 with the `train+validation` splits, and then `extra` split was added with the data from CSS10 at the 20th epoch.