wav2vec2-60-urdu / README.md
kingabzpro's picture
Update README.md
a04c2fe
metadata
language:
  - ur
license: apache-2.0
tags:
  - automatic-speech-recognition
  - robust-speech-event
datasets:
  - common_voice
metrics:
  - wer
  - cer
model-index:
  - name: wav2vec2-large-xlsr-53-urdu
    results:
      - task:
          type: automatic-speech-recognition
          name: Urdu Speech Recognition
        dataset:
          type: common_voice
          name: Urdu
          args: ur
        metrics:
          - type: wer
            value: 57.7
            name: Test WER
            args:
              - learning_rate: 0.0003
              - train_batch_size: 16
              - eval_batch_size: 8
              - seed: 42
              - gradient_accumulation_steps: 2
              - total_train_batch_size: 32
              - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
              - lr_scheduler_type: linear
              - lr_scheduler_warmup_steps: 200
              - num_epochs: 50
              - mixed_precision_training: Native AMP
          - type: cer
            value: 33.8
            name: Test CER
            args:
              - learning_rate: 0.0003
              - train_batch_size: 16
              - eval_batch_size: 8
              - seed: 42
              - gradient_accumulation_steps: 2
              - total_train_batch_size: 32
              - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
              - lr_scheduler_type: linear
              - lr_scheduler_warmup_steps: 200
              - num_epochs: 50
              - mixed_precision_training: Native AMP

wav2vec2-large-xlsr-53-urdu

This model is a fine-tuned version of Harveenchadha/vakyansh-wav2vec2-urdu-urm-60 on the common_voice dataset. It achieves the following results on the evaluation set:

  • Loss: 11.4593
  • Wer: 0.5772
  • Cer: 0.3384

Model description

The training and valid dataset is 0.58 hours. It was hard to train any model on lower number of so I decided to take Urdu checkpoint and finetune the XLSR model.

Training and evaluation data

Trained on Harveenchadha/vakyansh-wav2vec2-urdu-urm-60 due to lesser number of samples. Persian and Urdu are quite similar.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer
13.2136 8.33 100 9.5424 0.7672 0.4381
2.6996 16.67 200 8.4317 0.6661 0.3620
1.371 25.0 300 9.5518 0.6443 0.3701
0.639 33.33 400 9.4132 0.6129 0.3609
0.4452 41.67 500 10.8330 0.5920 0.3473
0.3233 50.0 600 11.4593 0.5772 0.3384

Framework versions

  • Transformers 4.15.0
  • Pytorch 1.10.0+cu111
  • Datasets 1.17.0
  • Tokenizers 0.10.3