Instructions to use olzhasAl/whisper-large-v3-tulpar with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use olzhasAl/whisper-large-v3-tulpar with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="olzhasAl/whisper-large-v3-tulpar")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("olzhasAl/whisper-large-v3-tulpar") model = AutoModelForSpeechSeq2Seq.from_pretrained("olzhasAl/whisper-large-v3-tulpar") - Notebooks
- Google Colab
- Kaggle
whisper-large-v3-tulpar
A fine-tuned Whisper Large V3 model optimized for Kazakh (қазақ тілі) speech recognition.
"Жігіт - ісімен, ат - тұлпарымен." (A man is known by his deeds, a horse — by its spirit.)
We got tired of Whisper confusing Kazakh with Turkish and hallucinating random text, so we fixed it. 900+ hours of Kazakh speech, 4x H200 GPUs, and a lot of tea later — here we are.
What's this?
Full fine-tune of all 1.55B parameters of openai/whisper-large-v3 on ~900 hours of Kazakh speech data. Not LoRA, not adapters — the whole thing, because we had the GPU memory and the ambition.
Current WER: 10.62% on ISSAI KSC test set (down from ~20%+ with vanilla Whisper).
This is v2. We're not done yet.
Quick Start
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa
# Load
processor = WhisperProcessor.from_pretrained("olzhasAl/whisper-large-v3-turbo-kk")
model = WhisperForConditionalGeneration.from_pretrained("olzhasAl/whisper-large-v3-turbo-kk")
model.eval()
# Transcribe
audio, sr = librosa.load("kazakh_audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
forced_ids = processor.get_decoder_prompt_ids(language="kazakh", task="transcribe")
generated = model.generate(
inputs.input_features,
forced_decoder_ids=forced_ids,
max_new_tokens=256,
num_beams=5,
no_repeat_ngram_size=4,
)
text = processor.batch_decode(generated, skip_special_tokens=True)[0]
print(text)
Training Details
| Parameter | Value |
|---|---|
| Base model | openai/whisper-large-v3 |
| Parameters | 1.55B (full fine-tune) |
| Training data | ~900 hours Kazakh speech |
| GPUs | 4x NVIDIA H200 (143GB HBM3e each) |
| Precision | BF16 mixed precision |
| Optimizer | AdamW (lr=1e-5, cosine schedule) |
| Batch size | 256 effective (32/GPU × 4 GPUs × 2 accum) |
| Epochs | 1 |
Data Sources
| Dataset | Hours | Type |
|---|---|---|
| ISSAI KSC | ~335h | Studio recordings |
| farabi-lab | ~554h | Diverse speech |
| FLEURS kk_kz | ~12h | Standard phrases |
| YouTube (curated) | ~57h | News, interviews, lectures |
Total: ~957 hours of verified Kazakh speech.
Preprocessing
- Audio: 16kHz mono WAV
- VAD: Silero VAD (float32, threshold=0.5)
- Segments: 5-30 seconds
- Language filtering: Kazakh-specific character detection (ә, і, ң, ғ, ү, ұ, қ, ө, һ)
Evaluation
| Benchmark | WER |
|---|---|
| ISSAI KSC test | 10.62% |
| FLEURS kk_kz test | TBD |
Target: < 8% (v3), < 5% (v4+)
Tips & Tricks
For best results:
- Use
num_beams=5andno_repeat_ngram_size=4— prevents repetition loops - Set
language="kazakh"explicitly — auto-detect may confuse KK with Turkish - For mixed KK/RU audio: use this model for Kazakh segments + original Whisper for Russian segments
Known quirks:
- Language detection head is biased toward Kazakh (trained only on KK data)
- May hallucinate on pure Russian input — don't force
language="kazakh"on Russian speech - Code-switching (KZ↔RU within one sentence) is partially supported but not perfect yet
What's Next
- v3: Training on ~3000+ hours (more YouTube data, podcasts)
- Sub-8% WER target
- Better code-switching support
- KenLM integration for post-processing
Citation
@misc{whisper-large-v3-turbo-kk,
author = {Olzhas Alseitov},
title = {whisper-large-v3-turbo-kk: Fine-tuned Whisper for Kazakh Speech Recognition},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/olzhasAl/whisper-large-v3-turbo-kk}
}
License
Apache 2.0
- Downloads last month
- 36
Model tree for olzhasAl/whisper-large-v3-tulpar
Base model
openai/whisper-large-v3Dataset used to train olzhasAl/whisper-large-v3-tulpar
Evaluation results
- WER on ISSAI KSC (test)test set self-reported10.620