Qwen3-ASR 0.6B · OpenASR

Multilingual speech recognition across 52 languages & dialects — the fast, lightweight Qwen3-ASR

Native speech-to-text in the OpenASR runtime — engineered for peak performance on CPU & GPU, no Python at inference time.

✨ Highlights

🌍 52 languages & dialects — 30 languages plus 22 Chinese dialects, with built-in spoken-language identification
🎧 Robust on hard audio — clean speech, singing voice, and songs over background music
⚡ Fast & light — the efficiency-tuned member of the Qwen3-ASR family; one model for both offline and streaming
🦀 Native in OpenASR — .oasr packs run with no Python at inference, engineered for peak performance on CPU & GPU

🚀 Quickstart

# 1. Install the OpenASR CLI  ·  https://openasr.org
# 2. Pull a build (pick a quant — see the table below)
openasr pull qwen3-asr-0.6b:q8

# 3. Transcribe
openasr transcribe audio.wav --model qwen3-asr-0.6b

All builds for this model:

openasr pull qwen3-asr-0.6b:fp16
openasr pull qwen3-asr-0.6b:q8
openasr pull qwen3-asr-0.6b:q4

📦 Available builds

Quant	File (`.oasr`)	Size	RAM peak	RTF · M1 CPU	RTF · M1 GPU	JFK ΔWER vs fp16
fp16	`qwen3-asr-0.6b-fp16.oasr`	1.88 GB	4.51 GB	0.58×	0.41×	0.0%
q8_0	`qwen3-asr-0.6b-q8_0.oasr`	1.01 GB	2.86 GB	0.55×	0.27×	0.0%
q4_k	`qwen3-asr-0.6b-q4_k.oasr`	599 MB	3.50 GB	0.52×	0.20×	0.0%

_{RTF = real-time factor on the fixed 11s JFK clip (lower is faster); RAM peak measured per pack
in an isolated subprocess. JFK ΔWER compares each quantized build's JFK transcript to this model's
fp16 JFK transcript, so it measures quantization drift rather than absolute recognition accuracy.
q8_0 is the recommended default — near-reference quality at a fraction of the
footprint.}

🧠 About Qwen3-ASR 0.6B

Qwen3-ASR-0.6B is the compact, efficiency-optimized member of Alibaba's Qwen3-ASR family, built on the Qwen3-Omni audio-understanding foundation. It performs language identification and speech recognition across 30 languages and 22 Chinese dialects (52 in total), and stays robust on challenging audio — clean speech, singing voice, and songs with background music. A single unified checkpoint handles both offline and real-time streaming transcription and can process long audio; the 0.6B size targets a strong accuracy-vs-efficiency trade-off (the Qwen team reports up to ~2000× throughput at high concurrency), making it the family's go-to for lightweight, high-throughput deployments. This OpenASR repo repackages the original weights as .oasr packs that run natively in the OpenASR runtime — no Python at inference time. The q8_0 build is the recommended default (near-reference accuracy at roughly half the footprint); q4_k suits tight-memory devices and fp16 is for verification or maximum fidelity. For word-level timestamps, pair it upstream with Qwen3-ForcedAligner-0.6B.

⚙️ How these packs were made

Converted from Qwen/Qwen3-ASR-0.6B with the OpenASR importer:

openasr model-pack import-qwen-local <src> <out>.oasr \
  --package-id qwen3-asr-0.6b --quantization {fp16,q8-0,q4-k}

The .oasr container is GGUF-backed; packs use zero-copy mmap weight binding and graph buffer reuse to keep peak memory low.

⚖️ License

These packs inherit the upstream model's license: Apache-2.0 (source). OpenASR packaging retains the upstream copyright and NOTICE; the only modifications are format conversion and quantization.

🙏 Acknowledgements

This pack is a redistribution of Qwen3-ASR-0.6B, created and open-sourced by the Qwen team at Alibaba (Qwen/Qwen3-ASR-0.6B). All credit for the original architecture, training, and weights belongs to them; the license is inherited from and identical to the upstream model (Apache-2.0). The GGUF quantization recipe and bit-identity verification methodology were informed by the community GGUF work at cstr/qwen3-asr-1.7b-GGUF. Thank you to both teams for releasing their work openly.

🔗 Links

🦀 OpenASR — https://github.com/QuintinShaw/OpenASR
🌐 Website — https://openasr.org
🤗 Upstream model — Qwen/Qwen3-ASR-0.6B

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for OpenASR/qwen3-asr-0.6b

Base model

Qwen/Qwen3-ASR-0.6B

Finetuned

(35)

this model