Model support per CrispASR — pure C++ inference with GGUF quantisation (no Python needed)

#12
by cstr - opened

We've built a complete C++ runtime for Qwen3-ASR in CrispASR, a multi-backend ASR tool based on ggml. One binary, one GGUF file — no Python, no PyTorch, no pip install.

What works:

  • Full pipeline: mel → Whisper-style audio encoder → Qwen3 0.6B LLM decode
  • Q4_K / Q5_0 / Q8_0 / F16 quantisation (513 MB Q4_K vs 1.8 GB F16)
  • 30 languages + 22 Chinese dialects
  • Faster-than-realtime on CPU (10.5s for 11s audio on a 4-core Xeon)
  • Word-level timestamps via Qwen3-ForcedAligner (-am qwen3-forced-aligner.gguf)
  • Temperature sampling + best-of-N decoding (--best-of 5 -tp 0.3)
  • Streaming from mic/stdin (--stream, --mic, --live)
  • Speaker diarisation, language ID, SRT/VTT/JSON output
  • GPU acceleration via CUDA / Metal / Vulkan (ggml backends)

Quick start:

git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR
cmake -S . -B build && cmake --build build -j8

# Auto-download and transcribe
./build/bin/crispasr --backend qwen3 -m auto -f audio.wav

# Or use pre-quantised GGUF
./build/bin/crispasr -m qwen3-asr-0.6b-q4_k.gguf -f audio.wav -osrt

Pre-quantised GGUFs: cstr/qwen3-asr-0.6b-GGUF

CrispASR supports 11 ASR backends in the same binary (Whisper, Parakeet, Canary, Cohere, Granite, Voxtral 3B/4B, wav2vec2, and Qwen3-ASR).

Sign up or log in to comment