--- license: apache-2.0 tags: - generated_from_trainer base_model: facebook/wav2vec2-xls-r-300m datasets: - ml-superb-subset metrics: - wer model-index: - name: wav2vec2-large-xls-r-300m-ml-superb-xty results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: ml-superb-subset type: ml-superb-subset config: xty split: test args: xty metrics: - type: wer value: 0.8114393463230672 name: Wer --- # wav2vec2-large-xls-r-300m-ml-superb-xty This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the ml-superb-subset dataset. It achieves the following results on the evaluation set: - Loss: 1.6099 - Wer: 0.8114 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0003 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 30 - num_epochs: 10 ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:------:|:----:|:---------------:|:------:| | 3.1825 | 0.8219 | 30 | 3.2071 | 1.0 | | 3.0491 | 1.6438 | 60 | 3.0508 | 1.0 | | 2.9717 | 2.4658 | 90 | 3.0385 | 1.0 | | 2.93 | 3.2877 | 120 | 2.9222 | 1.0 | | 2.6444 | 4.1096 | 150 | 2.3753 | 0.9931 | | 2.05 | 4.9315 | 180 | 1.9591 | 0.9868 | | 1.6856 | 5.7534 | 210 | 1.7810 | 0.9478 | | 1.4182 | 6.5753 | 240 | 1.6843 | 0.8843 | | 1.1773 | 7.3973 | 270 | 1.6370 | 0.8554 | | 1.0521 | 8.2192 | 300 | 1.5868 | 0.8215 | | 0.881 | 9.0411 | 330 | 1.5935 | 0.8202 | | 0.7605 | 9.8630 | 360 | 1.6099 | 0.8114 | ### Framework versions - Transformers 4.40.2 - Pytorch 2.2.1+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1