Instructions to use Thermostatic/rosettia-chanka-4b-alpha160 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Thermostatic/rosettia-chanka-4b-alpha160 with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="Thermostatic/rosettia-chanka-4b-alpha160")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Thermostatic/rosettia-chanka-4b-alpha160") model = AutoModelForImageTextToText.from_pretrained("Thermostatic/rosettia-chanka-4b-alpha160") - Notebooks
- Google Colab
- Kaggle
rosettia-chanka-4b-alpha160
A 4B Spanish → Quechua Chanka (quy_Latn) translation model. Full-weight merge of the team's Chanka-specialized 4B base (Qwen3.5-4B → broad-Quechua LoRA SFT → merge → full FT on clean Chanka) plus the v13 compact-mixed LoRA loaded at lora_alpha=160 (the inference-time α-scaling tweak that won our study). Built for #HACKATHONSomosNLP 2026 as part of the Rosettia project.
This is a single self-contained model — no PEFT required at inference. Load with AutoModelForCausalLM.
Headline result
| Metric (158-row clean Chanka held-out) | This model | 4B baseline (no compact-mixed) | Δ |
|---|---|---|---|
| chrF++ | 56.94 | 43.49 | +13.45 |
| BLEU | 30.76 | 16.14 | +14.62 |
| token F1 | 46.43 | 28.94 | +17.49 |
| TER (↓) | 62.21 | 82.49 | −19.36 |
Beats every off-the-shelf zero-shot MT system we tested on Chanka (NLLB-200 600M/1.3B/3.3B, TranslateGemma 4B, Gemma 4 E4B, Hy-MT2 7B, T5Gemma) by 28-50 chrF++.
How to use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
MODEL_ID = "Thermostatic/rosettia-chanka-4b-alpha160"
tok = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.bfloat16, device_map="auto")
source = "Yo vivo en Quinua"
messages = [
{"role": "system", "content": "Eres un traductor profesional español-quechua chanka."},
{"role": "user", "content": (
"Traduce del español al quechua chanka. Usa una traducción directa, "
"natural y fiel. Conserva nombres, números y entidades; evita copiar "
"el español salvo cuando sea necesario.\n\n"
f"Español: {source}"
)},
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=96, do_sample=False)
print(tok.decode(out[0, inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
# Expected: "Quinuapim tiyani"
For best held-out numbers, also pass the project's terminology glossary (matched by source word; see the Rosettia dataset card for the parquet) appended to the user prompt as a small bullet list and top-k=1 glossary entry. The eval recipe used --terminology-file clean_chanka/manual_quechua_chanka_glossary_simple_terms.parquet --terminology-top-k 1 --max-completion-length 96.
Training recipe (3-stage SFT + free α-scaling)
The LoRA component is the product of a 3-stage continuation chain on top of the team's Chanka-specialized base; the α-scaling tweak is applied at merge time (no separate inference step):
| Stage | Data | Steps | LR | Resulting chrF++ |
|---|---|---|---|---|
| v11 | self_verifiable_compact_mixed_sft.jsonl (1,055 Chanka pairs × {direct, compact-thinking}) |
512 | 5e-6 | 54.06 |
| v12 | same | 128 | 1e-6 (continuation) | 55.47 |
| v13 | same | 32 | 5e-7 (continuation) | 55.76 |
| Inference α-scaling (now baked in) | — | — | — | 56.94 |
LoRA config: r=64, α=128 (trained) → α=160 (merged in this model), dropout 0, target modules [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]. Trained with Unsloth on an L40S. The 4B Chanka-specialized base was: raw unsloth/Qwen3.5-4B → broad Quechua LoRA SFT (~768 steps on 169k AmericasNLP + SomosNLP pairs) → merge LoRA into full model → 48 steps full FT on clean Chanka.
The "compact-mixed" dataset construction: for each reviewed Chanka pair, we created two SFT rows — one with prompt_mode=direct (target = the Chanka translation only) and one with prompt_mode=compact (target = Analisis: ... Final: ... Puntaje: \boxed{...} — a DeepSeekMath-V2-inspired self-verification format). Multi-task training on this mix contributes ~+2 chrF++ over pure direct training at matched step count.
The α-scaling discovery: loading the LoRA at inference with lora_alpha 1.25× the trained value captures capacity gradient descent left on the table. We swept α ∈ {64, 96, 128, 160, 192, 224, 256} on the held-out and found a clean unimodal peak at α=160. Above 1.5× the model degrades fast.
What didn't help (the negative results)
A study-quality summary of what we tried that did not outperform this model:
- GSPO with the team's learned verifier as RL reward — regresses chrF++ ~1.5 from any strong SFT base.
- Self-scored Best-of-N — self-scores saturate at 1.0 with 100% false-confidence rate; equivalent to random.
- Linear listwise text reranker on K=16/32 sampled candidates — captures 0% of the 12-chrF oracle gap.
- Mergekit-style task-vector amplification — peak chrF++ 55.89, slightly below LoRA α-scaling, and loses 3 chrF in the merge round-trip.
- Activation-diff-guided targeted LoRA (top-8 layers of 32) — loses ~9 chrF vs full-layer training.
- Engineered reasoning traces from DS Flash (verbose / compact / natural CoT, 668-793 accepted of 838): all regressed or matched without lift. DS Flash confabulated morphology (e.g. claiming
kallpa=calle; real Chanka isñan). - Grounded reasoning traces from parallel Claude agents using the QHESWA Cuzco-Collao grammar manual + AMLQ Cuzco dictionary as RAG context (841 high-quality traces with 4.5-7.3 morpheme citations each): even rigorously grounded supervision did not outperform plain direct SFT. The model internalizes Chanka morphology implicitly from (source, gold) pairs faster than from explicit symbolic reasoning.
- Native Qwen3.5
<think>block training — both withenable_thinking=True(double-wrap bug, chrF++ < 3) and with literal<think>text +enable_thinking=False(chrF++ 44.55, barely beats baseline). - Scaling to raw Qwen3.5-9B + compact-mixed (skipping the Chanka pretraining stage) — plateaus at chrF++ 32.08, well below the 4B chain.
Data, leakage protections
Training data is the Thermostatic/rosettia-chanka-data clean Chanka subset (1,055 reviewed judicial-domain Spanish-Chanka pairs from the public manual Quechua Chanka Adminstración Justicia 2014). After deterministic eval split with validation_fraction=0.15, seed=3407, the 158 eval-set Chanka sources are excluded from training. We also filter out 56 train rows whose Spanish source happens to also appear in eval (from slash-alternative splits in the dataset), giving 841 strictly leak-free training pairs. All metrics reported are on the 158-row held-out.
Limitations and intended use
- Domain: training data is judicial / administrative Spanish-Chanka. Out-of-domain performance is not characterized.
- Variant: Chanka (Ayacucho/Apurímac/Huancavelica,
quy_Latn). Not appropriate for Cuzco-Collao (quz), Bolivian (quh), Northern (qup), or other Quechua varieties without further adaptation. - Capacity ceiling: this model is at the empirical limit of what we extracted from 1,055 reviewed Chanka pairs. More data is the most likely path to better numbers.
Citation / attribution
Built for the #HACKATHONSomosNLP 2026 project Rosettia – Quechua by the Thermostatic team. Trained on top of unsloth/Qwen3.5-4B with Unsloth and PEFT. Compact-mixed training data construction inspired by DeepSeekMath-V2 (we adapted its self-verification format as an auxiliary SFT objective, not as an RL reward — see the full negative-results study for why the RL variant failed for this task).
The source dataset PDF (the 2014 judicial manual) is in the public domain and reproduction is permitted with citation.
- Downloads last month
- -
Model tree for Thermostatic/rosettia-chanka-4b-alpha160
Dataset used to train Thermostatic/rosettia-chanka-4b-alpha160
Paper for Thermostatic/rosettia-chanka-4b-alpha160
Evaluation results
- chrF++ on Rosettia clean Chanka (158-row held-out)self-reported56.940
- BLEU on Rosettia clean Chanka (158-row held-out)self-reported30.760
- TER on Rosettia clean Chanka (158-row held-out)self-reported62.210

