๐ง Model Card: reuben256/nllb-distilled-600-lug
๐ Overview
reuben256/nllb-distilled-600-lug
is a fine-tuned version of Meta AIโs NLLB-200 distilled 600M model for English โ Luganda machine translation. It was developed by tekjuice AI ๐งช to support translation in low-resource African languages, specifically Luganda ๐บ๐ฌ โ a widely spoken Bantu language in Uganda.
๐ Use Cases
This model is designed for:
- ๐ Translating educational and public health materials
- ๐ฐ Localizing government or NGO communications
- ๐ฌ Supporting linguistic and NLP research
- ๐งฉ Enabling cross-lingual tasks via translation (e.g., summarization, QA)
๐ฆ Training Data
Fine-tuned using the dataset reuben256/tekjuice-eng-lug-target
, which includes:
- ๐ Public domain and open-source parallel corpora
- ๐ Crowdsourced and community-translated sentences
- ๐๏ธ Aligned media and educational content
๐ Evaluation
The model was evaluated using the BLEU metric ๐ to assess n-gram precision. Testing was done on a held-out set with similar domain characteristics as the training data.
โ ๏ธ Note: Human evaluation is recommended for assessing fluency, nuance, and cultural accuracy.
๐๏ธ Base Model
Built on top of:
- ๐งฌ
facebook/nllb-200-distilled-600M
โ a distilled multilingual model optimized for speed and low-resource language performance.
โ ๏ธ Limitations
- โ May struggle with slang, idioms, and culturally specific phrases
- ๐ Biases in training data may be reflected in outputs
- ๐ก Performance may degrade on out-of-domain or highly technical content
๐ฎ Future Plans
Coming improvements:
- ๐ Larger and more diverse datasets
- ๐ Reverse direction (Luganda โ English)
- ๐ฅ Domain-specific fine-tuning (e.g., health, legal)
- ๐ง Quality estimation and confidence scoring
๐ How to Use
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
MODEL = "reuben256/nllb-distilled-600-lug"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL)
tokenizer.src_lang = "eng_Latn"
tokenizer.tgt_lang = "lug_Latn"
text = "Farmers should plant more trees?"
inputs = tokenizer(text, return_tensors="pt")
translated_tokens = model.generate(**inputs)
print(tokenizer.batch_decode(translated_tokens, skip_special_tokens=True))
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for reuben256/nllb-distilled-600-lug
Base model
facebook/nllb-200-distilled-600M