TRANSLTR — English → Tiv Translation Model

Fine-tuned for Tiv, a Benue-Congo language spoken by ~4 million people in Benue State, Nigeria.
Built by Victor Achede under Black Sheep Co.


Model Summary

Property Detail
Base model Helsinki-NLP/opus-mt-en-mul
Task Machine translation (EN → TIV)
Language pair English → Tiv (tiv)
Architecture MarianMT (transformer seq2seq)
Training data Custom curated EN↔TIV parallel corpus (Bible-domain, conversational)
Fine-tuning epochs 10
Batch size 32
Hardware NVIDIA T4 (Google Colab)
Framework HuggingFace Transformers 4.x

Why This Exists

Tiv is one of Nigeria's major languages — spoken by millions across Benue State and the diaspora — yet it has zero representation in any major NLP benchmark, translation API, or pretrained multilingual model.

Google Translate doesn't support it. DeepL doesn't support it. NLLB-200 doesn't support it.

This model is the first step toward changing that. It is part of TRANSLTR, a real-time spoken language translation system being built to bridge Tiv speakers into the digital world — starting with live event translation at the GCK Benue conference, July 2025, IBB Square, Makurdi.


Usage

from transformers import MarianMTModel, MarianTokenizer

model_id  = "victorachede/tiv-translator"
tokenizer = MarianTokenizer.from_pretrained(model_id)
model     = MarianMTModel.from_pretrained(model_id)

def translate(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True)
    out    = model.generate(**inputs, max_length=128, num_beams=4)
    return tokenizer.decode(out[0], skip_special_tokens=True)

print(translate("For God so loved the world."))
print(translate("The Lord is my shepherd."))
print(translate("Ask and it shall be given unto you."))

Training Data

The dataset is a custom-built parallel corpus of English–Tiv sentence pairs, assembled specifically for this project. Sources include:

  • Bible text (primary domain) — Luke, John, Acts (priority books)
  • Conversational Tiv — everyday phrases and common expressions
  • Manual curation by native Tiv speakers

Dataset size at time of training: 28,987 verified pairs
Dataset is actively growing. Model will be retrained as corpus expands.


Limitations

  • Low-resource reality: 233 pairs is a starting point, not a ceiling. Outputs improve meaningfully with each dataset expansion.
  • Domain: currently strongest on Bible-register English. Colloquial or technical text may produce weaker results.
  • Tokenizer mismatch: the base MarianMT tokenizer was not trained on Tiv — subword segmentation of Tiv tokens is imperfect at this stage. A Tiv-native tokenizer is on the roadmap.
  • This is v0.1. It is not production-ready. It is a proof-of-concept that this problem is solvable.

Roadmap

  • Fine-tune MarianMT base on EN→TIV (28,987 pairs)
  • Expand dataset to 100,000+ pairs
  • Train Tiv-native SentencePiece tokenizer
  • ElevenLabs Tiv voice clone integration (TTS output)
  • Groq Whisper STT pipeline (speech input)
  • Live demo at GCK Benue, June 2025
  • Mobile SDK for offline Tiv translation

Citation

If you use this model or dataset in your research, please cite:

@misc{achede2025tivtranslator,
  author    = {Victor Achede},
  title     = {TRANSLTR: English-Tiv Neural Machine Translation},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/victorachede/tiv-translator}
}

About

Built by Victor Achede, founder of Black Sheep Co. — a technology holding company based in Makurdi, Benue State, Nigeria.
TRANSLTR is one of several products under active development targeting African language infrastructure, live event technology, and low-resource NLP.

"If it doesn't exist, build it."

Downloads last month
298
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for victorachede/tiv-translator

Finetuned
(16)
this model

Space using victorachede/tiv-translator 1