qwen-asr-0.6b-he — Real-Time Hebrew Speech Recognition

A compact (0.6B) streaming speech-to-text model fine-tuned for Hebrew, based on Qwen/Qwen3-ASR-0.6B.

It's small and fast enough to transcribe Hebrew in real time on a CPU, runs natively on Apple Silicon via MLX, and is compact enough to run fully on-device on an iPhone. It also handles English.

Highlights

🎯 Hebrew-first — fine-tuned on a purpose-built, carefully curated Hebrew speech corpus collected, cleaned, and aligned specifically for this task.
⚡ Real-time, low-latency streaming — transcribes as you speak.
💻 Runs anywhere — CPU, NVIDIA GPU (vLLM), Apple Silicon (MLX), on-device.
🌐 Hebrew + English (plus auto-detect).

Usage

The easiest path is the companion repository, which ships a one-line file transcriber and an OpenAI-compatible server (batch + realtime streaming) for both backends:

👉 GitHub: https://github.com/flowtyone/QwenASR-he

# transcribe a file (auto-selects CUDA/vLLM or Apple-Silicon/mlx-audio)
uv run python examples/simple.py recording.m4a --language he

Apple Silicon (mlx-audio)

from mlx_audio.stt import load

model = load("flowty1/qwen-asr-0.6b-he")
out = model.generate(audio_16k_mono_float32, language="Hebrew")
print(out.text)

NVIDIA GPU (vLLM) / transformers

This is a Qwen3ASRForConditionalGeneration model and uses the Qwen3-ASR runtime. See the companion repo above or the upstream Qwen3-ASR project for the vLLM/transformers inference toolkit.

Output stability & tuning

The model is accurate on real-world Hebrew, but — like most compact ASR models — it can occasionally repeat or hallucinate on noisy or long audio. Decoding defaults to greedy (temperature = 0), which is the most reliable baseline. If you see looping, the most effective levers are a repetition penalty (~`1.1–1.3`) and capping max new tokens. The companion repo exposes these as environment variables and applies an additional deterministic repetition-cleanup pass.

Languages

Validated for Hebrew and English (plus auto-detect). Other languages from the base model are not guaranteed on this checkpoint.

License & attribution

Fine-tune of Qwen/Qwen3-ASR-0.6B; released under Apache-2.0, following the base model's terms. Please also refer to the base model card for its conditions.

Downloads last month: 121

Safetensors

Model size

0.8B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for flowty1/qwen-asr-0.6b-he

Base model

Qwen/Qwen3-ASR-0.6B

Finetuned

(34)

this model