Model support per CrispASR — pure C++ inference with GGUF quantisation (no Python needed)
#12
by cstr - opened
We've built a complete C++ runtime for Qwen3-ASR in CrispASR, a multi-backend ASR tool based on ggml. One binary, one GGUF file — no Python, no PyTorch, no pip install.
What works:
- Full pipeline: mel → Whisper-style audio encoder → Qwen3 0.6B LLM decode
- Q4_K / Q5_0 / Q8_0 / F16 quantisation (513 MB Q4_K vs 1.8 GB F16)
- 30 languages + 22 Chinese dialects
- Faster-than-realtime on CPU (10.5s for 11s audio on a 4-core Xeon)
- Word-level timestamps via Qwen3-ForcedAligner (
-am qwen3-forced-aligner.gguf) - Temperature sampling + best-of-N decoding (
--best-of 5 -tp 0.3) - Streaming from mic/stdin (
--stream,--mic,--live) - Speaker diarisation, language ID, SRT/VTT/JSON output
- GPU acceleration via CUDA / Metal / Vulkan (ggml backends)
Quick start:
git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR
cmake -S . -B build && cmake --build build -j8
# Auto-download and transcribe
./build/bin/crispasr --backend qwen3 -m auto -f audio.wav
# Or use pre-quantised GGUF
./build/bin/crispasr -m qwen3-asr-0.6b-q4_k.gguf -f audio.wav -osrt
Pre-quantised GGUFs: cstr/qwen3-asr-0.6b-GGUF
CrispASR supports 11 ASR backends in the same binary (Whisper, Parakeet, Canary, Cohere, Granite, Voxtral 3B/4B, wav2vec2, and Qwen3-ASR).