Orbination Whisper AI

Quantization-aware compression of whisper-large-v3-turbo to a compact 368 MB, multilingual, CPU/GPU speech-to-text model (GGUF / whisper.cpp).

These are quantized GGUF checkpoints of a fine-tuned whisper-large-v3-turbo, produced with Q3_K-matched quantization-aware training (QAT) so that accuracy survives 3-bit quantization. A companion Go runtime (CPU/GPU hybrid, no PyTorch at runtime) is on GitHub.

โžก๏ธ Code, Go runtime & prebuilt binaries: https://github.com/amichail-1/Orbination-Whisper-AI

Files

File Size Role
ggml-large-v3-turbo-q3_k.bin 368 MB smallest
ggml-large-v3-turbo-q4_k.bin 474 MB balanced
ggml-large-v3-turbo-q5_k.bin 574 MB best accuracy

Results โ€” WER on held-out FLEURS (real speech), beam search

Model Size English Spanish French Greek
Q3_K 368 MB 0.065 0.050 0.065 0.148
Q4_K 474 MB 0.062 0.048 0.063 0.124
Q5_K 574 MB 0.061 0.047 0.061 0.110
FP16 (upper bound) 1.6 GB 0.061 0.046 0.060 0.108

High-resource languages stay essentially flat across precisions; the custom kernel's largest gains appear on quantization-sensitive content (Greek: 0.285 โ†’ 0.148 at equal size).

Method (short)

whisper-large-v3-turbo has a shallow 4-layer decoder, so naive โ‰ค3-bit quantization collapses it. We train with the exact ggml Q3_K quantizer in the forward pass (straight-through estimator on the backward) plus teacher distillation from the FP16 model. Because training == deployment, the exported standard Q3_K GGUF deploys at the trained error rate with no train/inference gap. Decoding uses beam search (size 5), which removes the repetition loops that inflate greedy WER.

The 368 MB floor is set by the token-embedding quantization (whisper.cpp compresses the 253 MB embedding to 3-bit); use Q4_K/Q5_K to give it more bits and lower WER further.

Usage (whisper.cpp)

# download a model
huggingface-cli download antoniosmich/Orbination-Whisper-AI ggml-large-v3-turbo-q3_k.bin --local-dir .

# run with whisper.cpp (16 kHz mono WAV)
./whisper-cli -m ggml-large-v3-turbo-q3_k.bin -bs 5 -l en audio.wav

Or use the Orbination Go runtime (CPU/GPU hybrid, CLI + HTTP server) from the GitHub repo.

License & attribution

MIT ยฉ 2026 Leia Enterprise Solutions (www.leia.gr) โ€” an Orbination application (www.orbination.com). Built on openai/whisper and ggerganov/whisper.cpp; evaluated on FLEURS.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for antoniosmich/Orbination-Whisper-AI

Finetuned
(559)
this model