qwen-asr-0.6b-he โ€” Real-Time Hebrew Speech Recognition

A compact (0.6B) streaming speech-to-text model fine-tuned for Hebrew, based on Qwen/Qwen3-ASR-0.6B.

It's small and fast enough to transcribe Hebrew in real time on a CPU, runs natively on Apple Silicon via MLX, and is compact enough to run fully on-device on an iPhone. It also handles English.

Highlights

  • ๐ŸŽฏ Hebrew-first โ€” fine-tuned on a purpose-built, carefully curated Hebrew speech corpus collected, cleaned, and aligned specifically for this task.
  • โšก Real-time, low-latency streaming โ€” transcribes as you speak.
  • ๐Ÿ’ป Runs anywhere โ€” CPU, NVIDIA GPU (vLLM), Apple Silicon (MLX), on-device.
  • ๐ŸŒ Hebrew + English (plus auto-detect).

Usage

The easiest path is the companion repository, which ships a one-line file transcriber and an OpenAI-compatible server (batch + realtime streaming) for both backends:

๐Ÿ‘‰ GitHub: https://github.com/flowtyone/QwenASR-he

# transcribe a file (auto-selects CUDA/vLLM or Apple-Silicon/mlx-audio)
uv run python examples/simple.py recording.m4a --language he

Apple Silicon (mlx-audio)

from mlx_audio.stt import load

model = load("flowty1/qwen-asr-0.6b-he")
out = model.generate(audio_16k_mono_float32, language="Hebrew")
print(out.text)

NVIDIA GPU (vLLM) / transformers

This is a Qwen3ASRForConditionalGeneration model and uses the Qwen3-ASR runtime. See the companion repo above or the upstream Qwen3-ASR project for the vLLM/transformers inference toolkit.

Output stability & tuning

The model is accurate on real-world Hebrew, but โ€” like most compact ASR models โ€” it can occasionally repeat or hallucinate on noisy or long audio. Decoding defaults to greedy (temperature = 0), which is the most reliable baseline. If you see looping, the most effective levers are a repetition penalty (~`1.1โ€“1.3`) and capping max new tokens. The companion repo exposes these as environment variables and applies an additional deterministic repetition-cleanup pass.

Languages

Validated for Hebrew and English (plus auto-detect). Other languages from the base model are not guaranteed on this checkpoint.

License & attribution

Fine-tune of Qwen/Qwen3-ASR-0.6B; released under Apache-2.0, following the base model's terms. Please also refer to the base model card for its conditions.

Downloads last month
121
Safetensors
Model size
0.8B params
Tensor type
BF16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for flowty1/qwen-asr-0.6b-he

Finetuned
(34)
this model