๐Ÿง  Model Card: reuben256/nllb-distilled-600-lug

๐ŸŒ Overview

reuben256/nllb-distilled-600-lug is a fine-tuned version of Meta AIโ€™s NLLB-200 distilled 600M model for English โ†” Luganda machine translation. It was developed by tekjuice AI ๐Ÿงช to support translation in low-resource African languages, specifically Luganda ๐Ÿ‡บ๐Ÿ‡ฌ โ€” a widely spoken Bantu language in Uganda.


๐Ÿš€ Use Cases

This model is designed for:

  • ๐Ÿ“š Translating educational and public health materials
  • ๐Ÿ“ฐ Localizing government or NGO communications
  • ๐Ÿ”ฌ Supporting linguistic and NLP research
  • ๐Ÿงฉ Enabling cross-lingual tasks via translation (e.g., summarization, QA)

๐Ÿ“ฆ Training Data

Fine-tuned using the dataset reuben256/tekjuice-eng-lug-target, which includes:

  • ๐Ÿ“– Public domain and open-source parallel corpora
  • ๐ŸŒ Crowdsourced and community-translated sentences
  • ๐Ÿ—ž๏ธ Aligned media and educational content

๐Ÿ“Š Evaluation

The model was evaluated using the BLEU metric ๐Ÿ“˜ to assess n-gram precision. Testing was done on a held-out set with similar domain characteristics as the training data.

โš ๏ธ Note: Human evaluation is recommended for assessing fluency, nuance, and cultural accuracy.


๐Ÿ—๏ธ Base Model

Built on top of:

  • ๐Ÿงฌ facebook/nllb-200-distilled-600M โ€” a distilled multilingual model optimized for speed and low-resource language performance.

โš ๏ธ Limitations

  • โŒ May struggle with slang, idioms, and culturally specific phrases
  • ๐Ÿ“‰ Biases in training data may be reflected in outputs
  • ๐Ÿ’ก Performance may degrade on out-of-domain or highly technical content

๐Ÿ”ฎ Future Plans

Coming improvements:

  • ๐Ÿ“ˆ Larger and more diverse datasets
  • ๐Ÿ” Reverse direction (Luganda โ†’ English)
  • ๐Ÿฅ Domain-specific fine-tuning (e.g., health, legal)
  • ๐Ÿง  Quality estimation and confidence scoring

๐Ÿš€ How to Use

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

MODEL = "reuben256/nllb-distilled-600-lug"

tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL)

tokenizer.src_lang = "eng_Latn"
tokenizer.tgt_lang = "lug_Latn"

text = "Farmers should plant more trees?"
inputs = tokenizer(text, return_tensors="pt")
translated_tokens = model.generate(**inputs)
print(tokenizer.batch_decode(translated_tokens, skip_special_tokens=True))
Downloads last month
8
Safetensors
Model size
1.4B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for reuben256/nllb-distilled-600-lug

Finetuned
(163)
this model

Dataset used to train reuben256/nllb-distilled-600-lug