Edit model card

Whisper Medium (Thai): Combined V3

This model is a fine-tuned version of openai/whisper-medium on augmented versions of the mozilla-foundation/common_voice_13_0 th, google/fleurs, and curated datasets. It achieves the following results on the common-voice-13 test set:

  • WER: 7.42 (with Deepcut Tokenizer)

Model description

Use the model with huggingface's transformers as follows:

from transformers import pipeline

MODEL_NAME = "biodatlab/whisper-th-medium-combined"  # specify the model name
lang = "th"  # change to Thai langauge

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
  language=lang,
  task="transcribe"
)
text = pipe("audio.mp3")["text"] # give audio mp3 and transcribe text

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 10000
  • mixed_precision_training: Native AMP

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.1.0
  • Datasets 2.16.1
  • Tokenizers 0.15.1

Citation

Cite using Bibtex:

@misc {thonburian_whisper_med,
    author       = { Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut },
    title        = { Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition },
    year         = 2022,
    url          = { https://huggingface.co/biodatlab/whisper-th-medium-combined },
    doi          = { 10.57967/hf/0226 },
    publisher    = { Hugging Face }
}
Downloads last month
2,371
Safetensors
Model size
817M params
Tensor type
FP16
Β·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Datasets used to train biodatlab/whisper-th-medium-combined

Spaces using biodatlab/whisper-th-medium-combined 4

Evaluation results