TRANSLTR — English → Tiv Translation Model

Fine-tuned for Tiv, a Benue-Congo language spoken by ~4 million people in Benue State, Nigeria.
Built by Victor Achede under Black Sheep Co.

Model Summary

Property	Detail
Base model	`Helsinki-NLP/opus-mt-en-mul`
Task	Machine translation (EN → TIV)
Language pair	English → Tiv (`tiv`)
Architecture	MarianMT (transformer seq2seq)
Training data	Custom curated EN↔TIV parallel corpus (Bible-domain, conversational)
Fine-tuning epochs	10
Batch size	32
Hardware	NVIDIA T4 (Google Colab)
Framework	HuggingFace Transformers 4.x

Why This Exists

Tiv is one of Nigeria's major languages — spoken by millions across Benue State and the diaspora — yet it has zero representation in any major NLP benchmark, translation API, or pretrained multilingual model.

Google Translate doesn't support it. DeepL doesn't support it. NLLB-200 doesn't support it.

This model is the first step toward changing that. It is part of TRANSLTR, a real-time spoken language translation system being built to bridge Tiv speakers into the digital world — starting with live event translation at the GCK Benue conference, July 2025, IBB Square, Makurdi.

Usage

from transformers import MarianMTModel, MarianTokenizer

model_id  = "victorachede/tiv-translator"
tokenizer = MarianTokenizer.from_pretrained(model_id)
model     = MarianMTModel.from_pretrained(model_id)

def translate(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True)
    out    = model.generate(**inputs, max_length=128, num_beams=4)
    return tokenizer.decode(out[0], skip_special_tokens=True)

print(translate("For God so loved the world."))
print(translate("The Lord is my shepherd."))
print(translate("Ask and it shall be given unto you."))

Training Data

The dataset is a custom-built parallel corpus of English–Tiv sentence pairs, assembled specifically for this project. Sources include:

Bible text (primary domain) — Luke, John, Acts (priority books)
Conversational Tiv — everyday phrases and common expressions
Manual curation by native Tiv speakers

Dataset size at time of training: 28,987 verified pairs
Dataset is actively growing. Model will be retrained as corpus expands.

Limitations

Low-resource reality: 233 pairs is a starting point, not a ceiling. Outputs improve meaningfully with each dataset expansion.
Domain: currently strongest on Bible-register English. Colloquial or technical text may produce weaker results.
Tokenizer mismatch: the base MarianMT tokenizer was not trained on Tiv — subword segmentation of Tiv tokens is imperfect at this stage. A Tiv-native tokenizer is on the roadmap.
This is v0.1. It is not production-ready. It is a proof-of-concept that this problem is solvable.

Roadmap

Fine-tune MarianMT base on EN→TIV (28,987 pairs)
Expand dataset to 100,000+ pairs
Train Tiv-native SentencePiece tokenizer
ElevenLabs Tiv voice clone integration (TTS output)
Groq Whisper STT pipeline (speech input)
Live demo at GCK Benue, June 2025
Mobile SDK for offline Tiv translation

Citation

If you use this model or dataset in your research, please cite:

@misc{achede2025tivtranslator,
  author    = {Victor Achede},
  title     = {TRANSLTR: English-Tiv Neural Machine Translation},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/victorachede/tiv-translator}
}

About

Built by Victor Achede, founder of Black Sheep Co. — a technology holding company based in Makurdi, Benue State, Nigeria.
TRANSLTR is one of several products under active development targeting African language infrastructure, live event technology, and low-resource NLP.

"If it doesn't exist, build it."

Downloads last month: 298

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for victorachede/tiv-translator

Base model

Helsinki-NLP/opus-mt-en-mul

Finetuned

(16)

this model

victorachede
/

tiv-translator