ixxan's picture
Update README.md
81f6b4c verified
metadata
library_name: transformers
language:
  - ug
license: apache-2.0
base_model: openai/whisper-small
tags:
  - generated_from_trainer
metrics:
  - cer
  - wer
model-index:
  - name: Whisper Small Fine-tuned with Uyghur Common Voice
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 15
          type: mozilla-foundation/common_voice_15_0
        metrics:
          - name: Wer
            type: wer
            value: 28.29947071879802
          - name: Cer
            type: cer
            value: 10.896777936451267

Whisper Small Fine-tuned with Uyghur Common Voice

This model is a fine-tuned version of openai/whisper-small on the Uyghur Common Voice dataset.

This model achieves the following results on the evaluation set:

  • Loss: 1.5920
  • Wer Ortho: 42.9701
  • Wer: 28.2995
  • Cer: 10.8968

Training and evaluation data

The training was done using the combined train and dev set of common_voice_15_0 (11215 recordings, ~20hrs of audio).

The testing was done using the test set of THUYG20 as the standard benchmark for Uyghur speech models.

Training procedure

Finetuning code avaiblable in https://github.com/ixxan/ug-speech

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 300
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Ortho Wer Cer
0.574400 0.7133 500 1.413890 59.765522 48.561550 17.639905
0.299600 1.4256 1000 1.283326 52.819004 41.377838 14.717958
0.130600 2.1398 1500 1.379338 52.265742 38.953389 16.260934
0.122500 2.8531 2000 1.313730 50.245894 36.494793 14.762585
0.060500 3.5663 2500 1.434626 47.589356 32.998976 12.185938
0.019500 4.2796 3000 1.526625 45.345570 30.975756 11.307346
0.015300 4.9929 3500 1.531676 44.120488 29.285470 11.690021
0.003300 5.7061 4000 1.592020 42.970054 28.299471 10.896778

Framework versions

  • Transformers 4.46.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3