Claro 4B

Claro is a fine-tuned Gemma 3 4B Instruct that rewrites complex English at CEFR A2 (elementary) level while preserving the source's facts. Trained on Apple Silicon with MLX (LoRA), via SFT followed by RL (GSPO) against a decomposed, mostly-deterministic reward.

Format note: this is an MLX model (converted base: mlx-community/gemma-3-4b-it-bf16). It loads with mlx_lm on Apple Silicon. It is not a transformers/PyTorch checkpoint. The repo also ships the LoRA adapter under adapter/ for applying on top of the base yourself.

Usage (MLX)

The model expects its chat template with the training system prompt — a raw prompt string will make it ramble. Replicate the training invocation:

from mlx_lm import load, generate

model, tok = load("miguelconner4/claro")

SYSTEM = ("Rewrite the user's text in CEFR A2 (Elementary English): short simple "
          "sentences, basic vocabulary, no idioms. Keep all important facts. "
          "Output only the rewritten text.")

complex_text = "The edifice, constructed circa 1750, was subsequently designated a historic landmark."
prompt = tok.apply_chat_template(
    [{"role": "system", "content": SYSTEM},
     {"role": "user", "content": complex_text}],
    tokenize=False, add_generation_prompt=True,
)
print(generate(model, tok, prompt=prompt, max_tokens=512, verbose=False))
# -> "The building was built around 1750. People decided it was important history."

How it was trained

SFT on ~1,500 (complex → A2) paragraph pairs distilled from a frontier model over random Wikipedia paragraphs, filtered by LLM judges.
RL (200 iters, group size 8, GSPO sequence-level importance sampling, KL β=0.1) from the SFT checkpoint, against a cardinal multiplicative reward:

reward = level_band × vocab × fidelity × format_gates (each ∈ [0,1]).
- level_band — deterministic A2 difficulty: readability (Flesch), mean sentence length, passive and subordination density, with bands calibrated to the 10th–90th percentiles of real A2 reference texts.
- vocab — penalty for off-A2-list words, with gloss-aware exemption (defining a hard term in-line is not penalized).
- fidelity — LLM judge, decomposed into fact-level recall + hallucination counts (not a holistic score).
- format_gates — hard pass/fail for markdown / degenerate loops.

Evaluation (30 held-out Wikipedia paragraphs)

On 30 held-out Wikipedia paragraphs, Claro's rewrites land at ~70% CEFR A2 (most of the rest A1; only ~7% drift up to the harder B1), per a DeepSeek mode-of-3 CEFR classifier — reliably simpler than the model it was fine-tuned from, with fewer too-hard outputs. Faithfulness is preserved: source-fact recall stays ~0.98, and a strict hand-audit (counting only real contradictions and fabricated facts, notparaphrase or omission) found ~3–4 genuine errors across the 30 paragraphs —indistinguishable from the baseline and consistent across three independent judge families (Haiku, GPT-4o, Gemini). So the GSPO step delivered a real gain in simplicity at no measurable cost to accuracy. The few remaining errors are subtle — dropped qualifiers or reversed relations (e.g. "younger"→"older sister").

Limitations

MLX-only (Apple Silicon). No PyTorch/transformers weights provided.
Evaluated at n=30; CEFR classification is genuinely noisy at the A2/B1 boundary (judges agree with a strict reference only ~50% of the time there). Treat the numbers as ±10pp.
~1 in 10 outputs carries a genuine fidelity slip (≈3–4 per 30 in our audit). The dominant mode is subtle attribute/relation errors (e.g. "younger"↔"older sister", a dropped qualifier), not wholesale fabrication.
English-only; tuned on encyclopedic prose. Out-of-domain text (dialogue, code, poetry) is untested.

License

Derivative of Google's Gemma 3; use is governed by the Gemma Terms of Use and the Gemma Prohibited Use Policy, which carry over to this model.

Downloads last month: 28

Safetensors

Model size

5B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for miguelconner4/claro

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Adapter

(381)

this model