Indic Whisper — ggml builds for whisper.cpp

ggml + q5_1 quantized builds of Indian-language Whisper fine-tunes, for on-device speech-to-text via whisper.cpp. Used by the Ukta in-store feedback kiosk for accurate regional-language transcription.

Files

File naming: ggml-<langCode>-small.bin — q5_1 quantized, ~181 MiB each.

Language	Code	File
Hindi	hi	`ggml-hi-small.bin`
Kannada	kn	`ggml-kn-small.bin`
Tamil	ta	`ggml-ta-small.bin`
Telugu	te	`ggml-te-small.bin`
Gujarati	gu	`ggml-gu-small.bin`

These are monolingual — each model transcribes only its own language. Malayalam/Marathi/Odia/Punjabi/Bengali are not yet covered (no published vasista22 small fine-tune); those languages fall back to a general model.

Provenance & attribution

Fine-tuned source models: vasista22 (whisper-<language>-{base,small}), © Speech Lab, IIT Madras — Apache 2.0.
Base architecture/weights: OpenAI Whisper — MIT.
Training corpora include AI4Bharat datasets (Shrutilipi, Vistaar) and Fleurs (CC-BY).
Conversion: whisper.cpp/models/convert-h5-to-ggml.py → f16 ggml, then quantize ... q5_1.

This repository redistributes derivatives of the above under the Apache License 2.0; see LICENSE. No change was made to the model weights other than format conversion and quantization.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support