Fun-ASR-Nano · GGUF (FunASR llama.cpp runtime)

GGUF build of Fun-ASR-Nano (SenseVoice SAN-M encoder + adaptor + Qwen3-0.6B LLM decoder) for the zero-Python, CPU/edge FunASR llama.cpp runtime — the accuracy leader (LLM decoder), single C++ binary.

Files

file size notes
funasr-encoder-f16.gguf 470 MB audio encoder + adaptor (f16)
qwen3-0.6b-q8_0.gguf 805 MB LLM decoder, recommended (Q8_0)
qwen3-0.6b-q4km.gguf 484 MB LLM decoder, smaller (Q4_K_M)

Usage (needs both the encoder and the LLM gguf)

llama-funasr-cli --enc funasr-encoder-f16.gguf -m qwen3-0.6b-q8_0.gguf -a audio.wav --vad fsmn-vad.gguf

On CPU: 8.30 % CER on the 184-clip Mandarin benchmark (vs whisper.cpp 22–31 %).

Links

Downloads last month
-
GGUF
Model size
0.2B params
Architecture
funasr-sensevoice-encoder
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support