whisper-small-ru-v2 / README.md
artyomboyko's picture
Update README.md
e87b7b2
metadata
license: apache-2.0
base_model: openai/whisper-small
tags:
  - generated_from_trainer
metrics:
  - wer
  - cer
model-index:
  - name: whisper-small
    results:
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 15
          type: artyomboyko/common_voice_15_0_RU
          args: ru
        metrics:
          - name: Test WER
            type: wer
            value: 12.675
          - name: Test CER
            type: cer
            value: 3.7305
language:
  - ru
datasets:
  - artyomboyko/common_voice_15_0_RU

Whisper-small-ru-v2

This model is a fine-tuned version of openai/whisper-small on an Russian part of the Common Voice 15 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1329
  • Wer: 12.6750
  • Cer: 3.7305
  • Learning Rate: 0.0000

Model description

Same as openai/whisper-small.

Intended uses & limitations

Same as openai/whisper-small

Training and evaluation data

Fine-tunned on an Russian part of the Common Voice 15 dataset.

Training procedure

According to the article "Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers"

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-08
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 250
  • training_steps: 15000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer Rate
0.0661 0.09 500 0.1358 12.9097 3.8217 0.0000
0.0616 0.17 1000 0.1357 12.9620 3.8949 0.0000
0.0601 0.26 1500 0.1357 12.8795 3.8225 0.0000
0.0666 0.35 2000 0.1353 12.9481 3.8871 0.0000
0.0669 0.43 2500 0.1352 12.8284 3.8283 0.0000
0.0665 0.52 3000 0.1351 12.8203 3.7833 0.0000
0.0649 0.61 3500 0.1349 12.8098 3.7824 0.0000
0.0607 0.69 4000 0.1347 12.8110 3.8105 0.0000
0.0636 0.78 4500 0.1345 12.7994 3.7893 0.0000
0.063 0.87 5000 0.1342 12.8319 3.8084 0.0000
0.0589 0.95 5500 0.1341 12.8807 3.8551 0.0000
0.0734 1.04 6000 0.1341 12.7691 3.7604 0.0000
0.0577 1.13 6500 0.1340 12.7645 3.7602 0.0000
0.052 1.21 7000 0.1340 12.7610 3.7655 0.0000
0.0626 1.3 7500 0.1339 12.7657 3.7593 0.0000
0.0617 1.39 8000 0.1338 12.7912 3.8268 0.0000
0.063 1.47 8500 0.1337 12.7343 3.7573 0.0000
0.0668 1.56 9000 0.1336 12.7308 3.7198 0.0000
0.0634 1.65 9500 0.1335 12.7215 3.7400 0.0000
0.0604 1.73 10000 0.1333 12.7192 3.7515 0.0000
0.0707 1.82 10500 0.1333 12.7052 3.7568 0.0000
0.0639 1.91 11000 0.1332 12.6983 3.7617 0.0000
0.0617 1.99 11500 0.1331 12.6936 3.7402 0.0000
0.0601 2.08 12000 0.1330 12.6901 3.7586 0.0000
0.0632 2.17 12500 0.1330 12.6785 3.7279 0.0000
0.0626 2.25 13000 0.1330 12.6808 3.7333 0.0000
0.066 2.34 13500 0.1329 12.6704 3.7512 0.0000
0.0674 2.42 14000 0.1329 12.6599 3.7384 0.0000
0.0637 2.51 14500 0.1329 12.6797 3.7428 0.0000
0.0641 2.6 15000 0.1329 12.6750 3.7305 0.0000

Framework versions

  • Transformers 4.36.0.dev0
  • Pytorch 2.1.1+cu121
  • Datasets 2.15.0
  • Tokenizers 0.15.0