Automatic Speech Recognition
Transformers
TensorBoard
Safetensors
Vietnamese
whisper
Eval Results
Inference Endpoints
Edit model card

whisper tiny fine-tuned on a very big collection of vietnamese speech datasets

TODO:

21k steps, warm-up 5%, batch size 16×2 (kaggle free T4×2)

manually evaluate WER on test set - vietnamese part:

@ float16 CommonVoice v16.1 FLEURS VIVOS
original whisper-tiny >100% 88.6% 62.5%
this model 26.6% 37.1% 18.7%

all training + evaluation scripts are on my repo: https://github.com/phineas-pta/fine-tune-whisper-vi

usage example:

import torch
from transformers import pipeline

PIPE = pipeline(task="automatic-speech-recognition", model="doof-ferb/whisper-tiny-vi", device="cuda:0", torch_dtype=torch.float16)
PIPE_KWARGS = {"language": "vi", "task": "transcribe"}

PIPE("audio.mp3", generate_kwargs=PIPE_KWARGS)["text"]
Downloads last month
7
Safetensors
Model size
37.8M params
Tensor type
F32
·

Finetuned from

Datasets used to train doof-ferb/whisper-tiny-vi

Evaluation results