whisper-tiny-ru / README.md
MachineTrofimov's picture
Upload tokenizer
a5b5efb verified
|
raw
history blame
2.31 kB
metadata
language:
  - ru
license: apache-2.0
tags:
  - generated_from_trainer
base_model: openai/whisper-tiny
datasets:
  - bond005/podlodka_speech
metrics:
  - wer
model-index:
  - name: whisper-tiny-ru
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: Podlodka Speech
          type: bond005/podlodka_speech
          args: 'config: ru, split: test'
        metrics:
          - type: wer
            value: 99.38757655293088
            name: Wer

whisper-tiny-ru

This model is a fine-tuned version of openai/whisper-tiny on the Podlodka Speech dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4991
  • Wer: 99.3876

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss Wer
0.295 5.5556 50 1.1982 83.2896
0.1297 11.1111 100 1.2768 76.0280
0.0517 16.6667 150 1.3594 72.5284
0.0203 22.2222 200 1.3969 85.4768
0.0094 27.7778 250 1.4394 104.2870
0.0061 33.3333 300 1.4646 87.8390
0.0049 38.8889 350 1.4813 90.4637
0.0043 44.4444 400 1.4909 86.7017
0.004 50.0 450 1.4973 99.6500
0.0038 55.5556 500 1.4991 99.3876

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1