Instructions to use flowty1/qwen-asr-0.6b-he with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use flowty1/qwen-asr-0.6b-he with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="flowty1/qwen-asr-0.6b-he")# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("flowty1/qwen-asr-0.6b-he", dtype="auto") - MLX
How to use flowty1/qwen-asr-0.6b-he with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir qwen-asr-0.6b-he flowty1/qwen-asr-0.6b-he
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
qwen-asr-0.6b-he โ Real-Time Hebrew Speech Recognition
A compact (0.6B) streaming speech-to-text model fine-tuned for Hebrew, based on Qwen/Qwen3-ASR-0.6B.
It's small and fast enough to transcribe Hebrew in real time on a CPU, runs natively on Apple Silicon via MLX, and is compact enough to run fully on-device on an iPhone. It also handles English.
Highlights
- ๐ฏ Hebrew-first โ fine-tuned on a purpose-built, carefully curated Hebrew speech corpus collected, cleaned, and aligned specifically for this task.
- โก Real-time, low-latency streaming โ transcribes as you speak.
- ๐ป Runs anywhere โ CPU, NVIDIA GPU (vLLM), Apple Silicon (MLX), on-device.
- ๐ Hebrew + English (plus auto-detect).
Usage
The easiest path is the companion repository, which ships a one-line file transcriber and an OpenAI-compatible server (batch + realtime streaming) for both backends:
๐ GitHub: https://github.com/flowtyone/QwenASR-he
# transcribe a file (auto-selects CUDA/vLLM or Apple-Silicon/mlx-audio)
uv run python examples/simple.py recording.m4a --language he
Apple Silicon (mlx-audio)
from mlx_audio.stt import load
model = load("flowty1/qwen-asr-0.6b-he")
out = model.generate(audio_16k_mono_float32, language="Hebrew")
print(out.text)
NVIDIA GPU (vLLM) / transformers
This is a Qwen3ASRForConditionalGeneration model and uses the Qwen3-ASR runtime.
See the companion repo above or the upstream
Qwen3-ASR project for the vLLM/transformers
inference toolkit.
Output stability & tuning
The model is accurate on real-world Hebrew, but โ like most compact ASR models โ it
can occasionally repeat or hallucinate on noisy or long audio. Decoding defaults to
greedy (temperature = 0), which is the most reliable baseline. If you see looping,
the most effective levers are a repetition penalty (~`1.1โ1.3`) and capping
max new tokens. The companion repo exposes these as environment variables and
applies an additional deterministic repetition-cleanup pass.
Languages
Validated for Hebrew and English (plus auto-detect). Other languages from the base model are not guaranteed on this checkpoint.
License & attribution
Fine-tune of Qwen/Qwen3-ASR-0.6B; released under Apache-2.0, following the base model's terms. Please also refer to the base model card for its conditions.
- Downloads last month
- 121
Quantized
Model tree for flowty1/qwen-asr-0.6b-he
Base model
Qwen/Qwen3-ASR-0.6B