Edit model card

whisper-small-khmer

This model is a fine-tuned version of openai/whisper-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4657
  • Wer: 0.6464

Model description

This model is fine-tuned with Google FLEURS & OpenSLR (SLR42) dataset.

from transformers import pipeline

pipe = pipeline(
    task="automatic-speech-recognition",
    model="seanghay/whisper-small-khmer",
)

result = pipe("audio.wav",
  generate_kwargs={
    "language":"<|km|>",
    "task":"transcribe"},
    batch_size=16
)

print(result["text"])

whisper.cpp

1. Transcode the input audio to 16kHz PCM

ffmpeg -i audio.ogg -ar 16000 -ac 1 -c:a pcm_s16le output.wav

2. Transcribe with whisper.cpp

./main -m ggml-model.bin -f output.wav --print-colors --language km

Training and evaluation data

  • training = google/fleurs['train+validation'] + openslr['train']
  • eval = google/fleurs['test']

Training procedure

This model was trained based on the project on GitHub with an NVIDIA A10 24GB.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 6.25e-06
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 800
  • training_steps: 8000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.2065 3.37 1000 0.3403 0.7929
0.0446 6.73 2000 0.2911 0.6961
0.008 10.1 3000 0.3578 0.6627
0.003 13.47 4000 0.3982 0.6564
0.0012 16.84 5000 0.4287 0.6512
0.0004 20.2 6000 0.4499 0.6419
0.0001 23.57 7000 0.4614 0.6469
0.0001 26.94 8000 0.4657 0.6464

Framework versions

  • Transformers 4.28.0.dev0
  • Pytorch 2.0.0+cu117
  • Datasets 2.11.1.dev0
  • Tokenizers 0.13.3
Downloads last month
18

Datasets used to train seanghay/whisper-small-khmer

Space using seanghay/whisper-small-khmer 1

Evaluation results