metadata

library_name: transformers
language:
  - ug
license: apache-2.0
base_model: openai/whisper-small
tags:
  - generated_from_trainer
metrics:
  - cer
  - wer
model-index:
  - name: Whisper Small Fine-tuned with Uyghur Common Voice
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 15
          type: mozilla-foundation/common_voice_15_0
        metrics:
          - name: Wer
            type: wer
            value: 28.29947071879802
          - name: Cer
            type: cer
            value: 10.896777936451267

Whisper Small Fine-tuned with Uyghur Common Voice

This model is a fine-tuned version of openai/whisper-small on the Uyghur Common Voice dataset.

This model achieves the following results on the evaluation set:

Loss: 1.5920
Wer Ortho: 42.9701
Wer: 28.2995
Cer: 10.8968

Training and evaluation data

The training was done using the combined train and dev set of common_voice_15_0 (11215 recordings, ~20hrs of audio).

The testing was done using the test set of THUYG20 as the standard benchmark for Uyghur speech models.

Training procedure

Finetuning code avaiblable in https://github.com/ixxan/ug-speech

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 300
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer Ortho	Wer	Cer
0.574400	0.7133	500	1.413890	59.765522	48.561550	17.639905
0.299600	1.4256	1000	1.283326	52.819004	41.377838	14.717958
0.130600	2.1398	1500	1.379338	52.265742	38.953389	16.260934
0.122500	2.8531	2000	1.313730	50.245894	36.494793	14.762585
0.060500	3.5663	2500	1.434626	47.589356	32.998976	12.185938
0.019500	4.2796	3000	1.526625	45.345570	30.975756	11.307346
0.015300	4.9929	3500	1.531676	44.120488	29.285470	11.690021
0.003300	5.7061	4000	1.592020	42.970054	28.299471	10.896778

Framework versions

Transformers 4.46.2
Pytorch 2.5.1+cu121
Datasets 3.1.0
Tokenizers 0.20.3