Automatic Speech Recognition
Transformers
TensorBoard
Safetensors
Irish
English
whisper
Generated from Trainer
Eval Results
Inference Endpoints
ymoslem's picture
Update README.md
dd0d1d7 verified
metadata
language:
  - ga
  - en
license: apache-2.0
base_model: openai/whisper-small
tags:
  - generated_from_trainer
datasets:
  - ymoslem/IWSLT2023-GA-EN
  - ymoslem/FLEURS-GA-EN
  - ymoslem/BitesizeIrish-GA-EN
  - ymoslem/SpokenWords-GA-EN-MTed
  - ymoslem/Tatoeba-Speech-Irish
  - ymoslem/Wikimedia-Speech-Irish
metrics:
  - bleu
  - wer
  - chrf
model-index:
  - name: Whisper Small GA-EN Speech Translation
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: >-
            IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia +
            augmented
          type: ymoslem/IWSLT2023-GA-EN
        metrics:
          - name: Bleu
            type: bleu
            value: 30.11
          - name: Wer
            type: wer
            value: 71.49932462854571
library_name: transformers

Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia datasets. The datasets are augmented in two ways: noise augmentation, and truncating low-amplitude samples. The best model checkpoint (this version) based on ChrF is at step 2800, epoch 1.2259, and it achieves the following results on the evaluation set:

  • Loss: 1.3547
  • Bleu: 32.57
  • Chrf: 47.04
  • Wer: 62.0891

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Hardware

1 NVIDIA A100-SXM4-80GB

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • generation_max_length: 225

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.3533 0.0438 100 1.7789 6.29 25.08 148.7618
1.9035 0.0876 200 1.5122 18.21 34.02 85.6821
1.5357 0.1313 300 1.3983 14.01 33.7 93.3363
1.3056 0.1751 400 1.3447 18.12 37.35 95.0023
1.1177 0.2189 500 1.3168 18.47 38.44 95.3624
0.984 0.2627 600 1.3202 26.82 41.23 67.3120
0.8945 0.3065 700 1.2947 26.73 42.53 67.1319
0.7508 0.3503 800 1.2476 25.67 42.06 74.2008
0.7127 0.3940 900 1.2630 22.59 41.05 75.7767
0.5944 0.4378 1000 1.2726 22.37 40.31 82.4854
0.4972 0.4816 1100 1.2898 22.88 42.52 82.5304
0.4517 0.5254 1200 1.2509 27.99 44.42 64.1603
0.3885 0.5692 1300 1.2887 29.58 44.8 63.1247
0.3337 0.6130 1400 1.2645 30.05 45.5 62.6294
0.2852 0.6567 1500 1.2972 28.2 43.57 68.6628
0.2583 0.7005 1600 1.2716 28.21 45.06 73.6155
0.2016 0.7443 1700 1.3346 27.55 43.21 74.3809
0.1883 0.7881 1800 1.3124 21.45 41.83 94.1018
0.1514 0.8319 1900 1.3178 28.2 44.09 63.7551
0.1311 0.8757 2000 1.3246 27.33 43.25 74.3359
0.1128 0.9194 2100 1.3464 25.21 42.93 83.2508
0.0994 0.9632 2200 1.3315 30.51 45.74 64.7456
0.0512 1.0070 2300 1.3377 30.89 46.44 63.3498
0.0447 1.0508 2400 1.3587 28.72 44.36 64.3404
0.0368 1.0946 2500 1.3619 31.53 46.56 61.9541
0.0281 1.1384 2600 1.3596 30.98 46.45 70.4638
0.0273 1.1821 2700 1.3656 32.09 46.85 62.1792
0.0287 1.2259 2800 1.3547 32.57 47.04 62.0891
0.025 1.2697 2900 1.3539 26.94 45.43 81.1796
0.0263 1.3135 3000 1.3512 30.11 46.73 71.4993

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1