Qwen3-1.7B-EasyLanguage (4-bit, MLX)

LoRA fine-tune of Qwen/Qwen3-1.7B that rewrites live speech transcripts into easy-to-read registers across 20 language locales — German (Leichte Sprache), French (FALC), English (Easy/Plain English), Spanish (Lectura Fácil), Easy-to-Read Arabic (Inclusion Europe / Information for All), Letlæst (Inclusion Europe ETR), and more — for the Live Linguist on-device captioner. Each locale follows its own national / European Easy-to-Read or plain-language standard. Quantized to 4-bit for Apple-silicon inference via MLX.

It splits run-ons into short sentences, drops disfluencies, keeps names/numbers, and stays in the input language. Trained on a lean prompt (no few-shots) so the register is internalized — shorter prompts, lower live-caption latency.

Evaluation (held-out test set; SARI = simplification quality)

lang SARI ft SARI stock chrF ft LID ft compliance ft
de 50.61 41.92 46.31 1.0 0.995
fr 61.55 54.15 55.31 1.0 0.995
es 62.37 56.9 58.0 1.0 0.97
en 62.84 52.0 57.32 1.0 0.99
ar 52.85 51.05 51.56 1.0 0.97
da 55.97 49.17 54.21 0.995 0.975
et 49.92 45.59 52.72 0.995 0.99
fi 51.72 47.25 58.09 1.0 0.995
hi 54.85 49.22 48.12 0.975 0.845
it 55.64 49.53 52.24 1.0 0.8196
ja 8.35 8.36 46.21 1.0 0.965
ko 51.43 46.48 44.41 1.0 0.975
nl 54.15 51.81 52.22 1.0 0.935
pt-BR 57.89 56.87 54.45 1.0 0.87
pt-PT 60.65 54.3 61.36 1.0 0.98
ru 55.61 51.04 53.23 0.995 0.93
sk 51.32 46.89 48.72 0.995 0.975
sv 56.1 49.53 58.24 0.99 0.97
tr 51.31 46.35 51.85 1.0 0.995
vi 63.22 57.28 62.09 1.0 0.805
zh-CN 8.77 8.8 54.19 0.96 0.905

Usage (MLX)

from mlx_lm import load, generate
model, tok = load("ndgold/Qwen3-1.7B-EasyLanguage-4bit")
msgs = [{"role":"system","content":"<framework system prompt>"},
        {"role":"user","content":"Original: <utterance>\nRewritten:"}]
p = tok.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False, enable_thinking=False)
print(generate(model, tok, prompt=p, max_tokens=96))

Sources & licenses

  • Base model: Qwen3 (Apache-2.0).
  • German seed data: tum-nlp/German4All-Corpus (German Wikipedia, CC BY-SA).
  • Synthetic pairs (all other languages + de augmentation): spoken→easy-language pairs generated by Claude (Anthropic) — Leichte Sprache (de), FALC (fr), Easy/Plain English (en), Lectura Fácil (es), and 25 further locales following each language's Easy-to-Read / plain-language standard (Inclusion Europe "Information for All", Selkokieli, Lättläst, やさしい日本語, ISO 24495-1, …). Every pair is filtered by a deterministic validator suite (per-language sentence-length caps, language-ID, fidelity anchoring, number preservation, anti-parroting).
  • Intended for the Live Linguist on-device live-caption simplifier. Not a general chatbot.
Downloads last month
65
Safetensors
Model size
0.3B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ndgold/Qwen3-1.7B-EasyLanguage-4bit

Finetuned
Qwen/Qwen3-1.7B
Quantized
(295)
this model