Instructions to use burakaydinofficial/whisper-tiny-mla-cv11 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use burakaydinofficial/whisper-tiny-mla-cv11 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="burakaydinofficial/whisper-tiny-mla-cv11", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("burakaydinofficial/whisper-tiny-mla-cv11", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Whisper-Tiny-MLA (11 languages) — MLA-converted, 62.5% smaller decode KV-cache
The on-device-tier sibling of the WhisperMLA family: openai/whisper-tiny (39M) with its
decoder self-attention converted MHA→MLA (per Whisper-MLA, arXiv:2603.00563),
recovery-fine-tuned on 11 languages of the CC0
Whispered corpus (32k clips/lang).
from transformers import AutoModelForSpeechSeq2Seq
model = AutoModelForSpeechSeq2Seq.from_pretrained("burakaydinofficial/whisper-tiny-mla-cv11", trust_remote_code=True) # transformers==4.46.x
Honest sizing note (read this first)
Conversion cost grows as the student shrinks — measured across the family: small ≈ +0.4
median WER → base ≈ +1.0 → tiny ≈ +1.9. At the tiny tier you pay ≈ +1.9 WER (median) for the
62.5% cache cut. If quality is the priority, prefer the small variant; this tier is for
memory-constrained deployments where the cache cut matters most.
Results (CommonVoice-17 test, n=1500/lang; WER/CER %; cost = paired vs an identically-trained unconverted control)
| Lang | this model (WER / CER) | conversion cost |
|---|---|---|
| en | 29.1 / 15.8 | +2.41 ✱ |
| de | 42.8 / 16.6 | +1.95 ✱ |
| es | 29.4 / 10.7 | +1.07 ✱ |
| fr | 45.4 / 20.1 | +2.05 ✱ |
| ru | 42.9 / 14.2 | +3.32 ✱ |
| tr | 53.7 / 17.6 | +2.63 ✱ |
| cy | 86.5 / 38.0 | −0.19 (ns) |
| ar | 67.5 / 28.7 | +1.58 ✱ |
| th | 65.8 / 28.1 | +0.75 CER ✱ |
| zh | 99.3 / 34.1 | −1.49 CER (ns) |
| ka | 122.1 / 81.0 | −0.50 (ns) — floor |
Absolute quality is tiny-tier-typical (much lower than small — that is the base model, not
MLA). Encoder frozen both arms; 15,000 steps; warmup+cosine; fp16.
Limitations
Same as the flagship: transformers==4.46.x + trust_remote_code required; not loadable in whisper.cpp/faster-whisper/CT2; coverage = these 11 languages (unseen scripts degrade); ka reported as the labeled model-class floor; read-speech domain.
Acoustic conditions of the evaluation
Evaluated on crowdsourced consumer-microphone recordings with real environmental noise — traffic, room reverb, variable devices — CommonVoice's native conditions, not studio audio. The numbers above already include that heterogeneity. Not yet benchmarked: far-field, telephony (8 kHz), overlapping speech; an SNR-ladder robustness section will be added when measured.
- Downloads last month
- 6
Model tree for burakaydinofficial/whisper-tiny-mla-cv11
Base model
openai/whisper-tiny