Instructions to use Tonykip/chamgei-kal2sw-nllb600m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Tonykip/chamgei-kal2sw-nllb600m with PEFT:
from peft import PeftModel from transformers import AutoModelForSeq2SeqLM base_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M") model = PeftModel.from_pretrained(base_model, "Tonykip/chamgei-kal2sw-nllb600m") - Notebooks
- Google Colab
- Kaggle
chamgei-kal2sw-nllb600m β r001 (epoch 32)
LoRA adapter that fine-tunes facebook/nllb-200-distilled-600M for Kalenjin β Swahili translation. Companion to the forward-direction adapter used in chamgei.com's translate demo; unlocks the reverse direction for the bidirectional translate UX, plus enables Kalenjin β English transitively via Helsinki SWβEN.
Trained by Tony Kipkemboi at chamgei.labs, a research effort focused on machine translation for underserved Kenyan languages.
Highlights
- chrF++ 68.55 / BLEU 57.58 on the inverted 250-row thinkKenya
kln_swa/testholdout (seed=42) - +9.76 chrF++ over the equivalent SW β KAL recipe (Phase 1d replicated mutaician's 58.79 on the same data)
- Direction-asymmetry win: generating high-resource Swahili from low-resource Kalenjin is easier than the reverse, even with identical paired data β and this run quantifies the gap
Tag scheme
This repository hosts the chamgei-kal2sw-nllb600m family. Specific runs are pinned via revision tags:
| Tag | What | chrF++ |
|---|---|---|
r001-ep32 β |
Epoch-32 checkpoint of run r001 (peak quality) | 68.55 |
The main branch always points at the latest recommended adapter.
Use
from transformers import AutoModelForSeq2SeqLM, NllbTokenizerFast
from peft import PeftModel
import torch
base = "facebook/nllb-200-distilled-600M"
adapter = "Tonykip/chamgei-kal2sw-nllb600m" # main = latest, or revision="r001-ep32"
tokenizer = NllbTokenizerFast.from_pretrained(base)
tokenizer.src_lang = "luo_Latn" # Kalenjin via the trained hijack
model = AutoModelForSeq2SeqLM.from_pretrained(base, torch_dtype=torch.float32)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
text = "kere inee oleloo" # "anatazama mbali" (he is looking far)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
forced_bos_token_id=tokenizer.convert_tokens_to_ids("swh_Latn"),
max_length=256,
num_beams=5,
length_penalty=1.1,
early_stopping=True,
)
print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])
# β "Yeye ni kuangalia mbali."
Training recipe
| Setting | Value |
|---|---|
| Base model | facebook/nllb-200-distilled-600M |
| Adapter | LoRA r=64, alpha=128, dropout=0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, out_proj, fc1, fc2 (attention + FFN) |
| Trainable params | 34,603,008 (5.33% of base) |
| Training direction | KAL β SW (luo_Latn β swh_Latn) |
| Warm-start | Phase 1d adapter (SW β KAL on same data, chrF++ 58.79) |
| Training data | thinkKenya/kenyan-low-resource-language-data, kln_swa split, ~28k pairs |
| Eval data | Same source, test split, 250-row sample with random_state=42 |
| Learning rate / scheduler | 1e-4 / cosine, warmup_ratio=0.05 |
| Per-device batch size / grad accum | 8 / 1 |
| Max seq length | 256 |
| Epochs trained | 42 (peak at epoch 32) |
| Optimizer | AdamW (default), fp16 |
| Seed | 42 |
| Compute | A10G, 6h 9min wall-clock, ~$6.80 |
| Inference | num_beams=5, length_penalty=1.1 |
Recipe inherits from Phase 1d's mutaician-equivalent setup. The novel pieces here are: direction inversion (KAL β SW instead of SW β KAL), warm-starting from Phase 1d's adapter (reuses the trained Kalenjin BPE embeddings), and tuning the inference length penalty.
Training curve
Evaluations at 25 / 50 / 75 / 100 % of training (same 250-row holdout, KAL β SW direction):
| Epoch | Step | chrF++ | BLEU | Ξ vs Phase 1d (58.79) |
|---|---|---|---|---|
| 11 | 38,643 | 53.65 | 33.90 | -5.14 |
| 21 | 73,773 | 64.69 | 52.11 | +5.90 |
| 32 β | 112,416 | 68.55 | 57.58 | +9.76 (peak) |
| 41 | 144,033 | 68.51 | 57.69 | +9.72 |
| 42 | 147,546 | 68.49 | 57.66 | +9.70 |
The published checkpoint (r001-ep32) is the epoch-32 peak. Epochs 32, 41, and 42 are all within 0.06 chrF++ of each other β the model plateaued. Future runs in this family should train to ~32 epochs for the same quality at 24% less compute.
Limitations
- Stylistic shifts in 30% of outputs β the model embellishes occasionally (adding
Yeye,sana, etc.) or shifts mood (declarative β interrogative). A length-penalty sweep confirmed this is learned behaviour, not a decoding artifact; future runs will explore RL with chrF++ reward or filtered training data. - Rare-vocabulary misses β words like
chigoni(kitchen) and number/multiplier compounds liketaman ... tamanoccasionally produce off-meaning translations. - Dialect coverage β training data is mainstream Kalenjin (Nandi + Kipsigis tagged); other sub-tribes (Tugen, Marakwet, Sabaot, Keiyo, Pokot, Sengwer, Ogiek, Terik) have 0 rows in the corpus.
- Domain coverage β thinkKenya leans toward everyday + religious + procedural text; legal, technical, or scientific Kalenjin is out-of-distribution.
License
CC-BY-NC-4.0, inheriting from the facebook/nllb-200-distilled-600M base model. Non-commercial use only. For commercial inquiries, contact iamtonykipkemboi@gmail.com.
Acknowledgments
- thinkKenya for the
kenyan-low-resource-language-datacorpus - mutaician for publishing the NLLB+LoRA Western-Nilotic-hijack recipe that this run inverts and extends
- Meta AI for the NLLB-200 base model
- The Modal team for the training infrastructure
Citation
If you use this adapter in research, please cite:
@misc{chamgei_kal2sw_2026,
author = {Kipkemboi, Tony},
title = {chamgei-kal2sw-nllb600m: NLLB+LoRA for Kalenjin to Swahili Translation},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Tonykip/chamgei-kal2sw-nllb600m}}
}
- Downloads last month
- 17
Model tree for Tonykip/chamgei-kal2sw-nllb600m
Base model
facebook/nllb-200-distilled-600M