batoula187's picture
Upload tokenizer
8efa8b9 verified
metadata
tags:
  - generated_from_trainer
base_model: batoula187/wav2vec2-large-xls-r-300m-arabic-colab
datasets:
  - common_voice_12_0
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xls-r-300m-arabic-colab
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: common_voice_12_0
          type: common_voice_12_0
          config: ar
          split: test[:10%]
          args: ar
        metrics:
          - type: wer
            value: 0.7661710754972002
            name: Wer

wav2vec2-large-xls-r-300m-arabic-colab

This model is a fine-tuned version of batoula187/wav2vec2-large-xls-r-300m-arabic-colab on the common_voice_12_0 dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0728
  • Wer: 0.7662

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.1061 2.2599 200 1.8297 0.8034
0.1496 4.5198 400 1.6173 0.7955
0.2105 6.7797 600 1.6220 0.8040
0.1798 9.0395 800 2.2087 0.8405
0.1389 11.2994 1000 1.7900 0.7868
0.1143 13.5593 1200 1.7566 0.7886
0.103 15.8192 1400 1.8148 0.7689
0.0904 18.0791 1600 1.8059 0.7627
0.0766 20.3390 1800 2.1398 0.7907
0.0682 22.5989 2000 2.0384 0.7779
0.0583 24.8588 2200 2.0727 0.7658
0.0575 27.1186 2400 2.1649 0.7758
0.0582 29.3785 2600 2.0728 0.7662

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1