whisper-klein-nl

A tiny Dutch (nl) ASR model — a fine-tune of openai/whisper-tiny (39M params) on LokaalHub/nl-asr-cv. Built for on-device use: small footprint, low real-time factor on CPU.

TL;DR

A single fine-tune on 74.4h of Dutch Common Voice takes WER from ~44.03% (base Whisper-tiny) to **22.41%** (49.1% relative drop) on a held-out, speaker- and sentence-disjoint test split.

3-axis evaluation (accuracy / footprint / speed)

All systems scored on the same held-out panel through one shared text normalizer (BasicTextNormalizer). RTF = CPU compute seconds per audio second (lower is faster).

Model	params	size (fp32)	RTF (CPU)	cv17-test	fleurs-test	mean WER
LokaalHub/whisper-klein-nl (ours)	58M	230.7 MB	0.161	28.63	40.13	34.38%
openai/whisper-tiny	38M	151.0 MB	0.091	46.15	49.14	47.64%

Usage

from transformers import pipeline
asr = pipeline("automatic-speech-recognition", model="LokaalHub/whisper-klein-nl")
asr("audio.wav", generate_kwargs={"language": "nl", "task": "transcribe"})

Training

Standard Hugging Face Seq2SeqTrainer fine-tune (bf16), built and verified by the tiny-asr-loop pipeline.

Limitations

Tiny-model fine-tune on read speech (Common Voice). The internal test split is small and speaker-disjoint — see the panel table for FLEURS / out-of-domain numbers.

Downloads last month: 97

Safetensors

Model size

57.7M params

Tensor type

F32

Model tree for LokaalHub/whisper-klein-nl

Base model

openai/whisper-tiny

Finetuned

(1847)

this model

Dataset used to train LokaalHub/whisper-klein-nl

Evaluation results

WER (normalized) on LokaalHub/nl-asr-cv (test)
test set self-reported

22.410