Qwen3-ASR 0.6B Β· OpenASR
Multilingual speech recognition across 52 languages & dialects β the fast, lightweight Qwen3-ASR
Native speech-to-text in the OpenASR runtime β engineered for peak performance on CPU & GPU, no Python at inference time.
β¨ Highlights
- π 52 languages & dialects β 30 languages plus 22 Chinese dialects, with built-in spoken-language identification
- π§ Robust on hard audio β clean speech, singing voice, and songs over background music
- β‘ Fast & light β the efficiency-tuned member of the Qwen3-ASR family; one model for both offline and streaming
- π¦ Native in OpenASR β
.oasrpacks run with no Python at inference, engineered for peak performance on CPU & GPU
π Quickstart
# 1. Install the OpenASR CLI Β· https://openasr.org
# 2. Pull a build (pick a quant β see the table below)
openasr pull qwen3-asr-0.6b:q8
# 3. Transcribe
openasr transcribe audio.wav --model qwen3-asr-0.6b
All builds for this model:
openasr pull qwen3-asr-0.6b:fp16
openasr pull qwen3-asr-0.6b:q8
openasr pull qwen3-asr-0.6b:q4
π¦ Available builds
| Quant | File (.oasr) |
Size | RAM peak | RTF Β· M1 CPU | RTF Β· M1 GPU | JFK ΞWER vs fp16 |
|---|---|---|---|---|---|---|
| fp16 | qwen3-asr-0.6b-fp16.oasr |
1.88 GB | 4.51 GB | 0.58Γ | 0.41Γ | 0.0% |
| q8_0 | qwen3-asr-0.6b-q8_0.oasr |
1.01 GB | 2.86 GB | 0.55Γ | 0.27Γ | 0.0% |
| q4_k | qwen3-asr-0.6b-q4_k.oasr |
599 MB | 3.50 GB | 0.52Γ | 0.20Γ | 0.0% |
RTF = real-time factor on the fixed 11s JFK clip (lower is faster); RAM peak measured per pack in an isolated subprocess. JFK ΞWER compares each quantized build's JFK transcript to this model's fp16 JFK transcript, so it measures quantization drift rather than absolute recognition accuracy. q8_0 is the recommended default β near-reference quality at a fraction of the footprint.
π§ About Qwen3-ASR 0.6B
Qwen3-ASR-0.6B is the compact, efficiency-optimized member of Alibaba's Qwen3-ASR family,
built on the Qwen3-Omni audio-understanding foundation. It performs language identification and
speech recognition across 30 languages and 22 Chinese dialects (52 in total), and stays robust
on challenging audio β clean speech, singing voice, and songs with background music. A single
unified checkpoint handles both offline and real-time streaming transcription and can process long
audio; the 0.6B size targets a strong accuracy-vs-efficiency trade-off (the Qwen team reports up to
~2000Γ throughput at high concurrency), making it the family's go-to for lightweight, high-throughput
deployments. This OpenASR repo repackages the original weights as .oasr packs that run natively in
the OpenASR runtime β no Python at inference time. The q8_0 build is the recommended default
(near-reference accuracy at roughly half the footprint); q4_k suits tight-memory devices and
fp16 is for verification or maximum fidelity. For word-level timestamps, pair it upstream with
Qwen3-ForcedAligner-0.6B.
βοΈ How these packs were made
Converted from Qwen/Qwen3-ASR-0.6B with the OpenASR importer:
openasr model-pack import-qwen-local <src> <out>.oasr \
--package-id qwen3-asr-0.6b --quantization {fp16,q8-0,q4-k}
The .oasr container is GGUF-backed; packs use zero-copy mmap weight binding and graph
buffer reuse to keep peak memory low.
βοΈ License
These packs inherit the upstream model's license: Apache-2.0 (source). OpenASR packaging retains the upstream copyright and NOTICE; the only modifications are format conversion and quantization.
π Acknowledgements
This pack is a redistribution of Qwen3-ASR-0.6B, created and open-sourced by the Qwen team at Alibaba (Qwen/Qwen3-ASR-0.6B). All credit for the original architecture, training, and weights belongs to them; the license is inherited from and identical to the upstream model (Apache-2.0). The GGUF quantization recipe and bit-identity verification methodology were informed by the community GGUF work at cstr/qwen3-asr-1.7b-GGUF. Thank you to both teams for releasing their work openly.
π Links
- π¦ OpenASR β https://github.com/QuintinShaw/OpenASR
- π Website β https://openasr.org
- π€ Upstream model β Qwen/Qwen3-ASR-0.6B
Model tree for OpenASR/qwen3-asr-0.6b
Base model
Qwen/Qwen3-ASR-0.6B