cameroon-int8
int8 CTranslate2 serving bundle for ~60 Cameroonian languages (French-pivot MarianMT).
This repository is the quantized (int8) CTranslate2 serving bundle that powers translation for roughly 60 Cameroonian languages. Every model is a MarianMT translation model converted to the CTranslate2 format and quantized to int8.
All language pairs are French-pivot: translation goes either French -> local language (francais-<lang>) or local language -> French (<lang>-francais). To translate between two local languages, pivot through French.
Compared to the original fp32 PyTorch checkpoints, this int8 bundle is roughly 3.8x smaller on disk and runs about 6x faster at inference, which makes it practical to serve many languages from modest hardware.
Repository layout
Each subfolder is exactly one translation direction (one pair), and contains the full CTranslate2 model plus its tokenizer:
cameroon-int8/
βββ aghem-francais/
β βββ model.bin
β βββ config.json
β βββ (tokenizer files)
βββ francais-aghem/
β βββ model.bin
β βββ config.json
β βββ (tokenizer files)
βββ ...
βββ yemba-francais/
There are 119 such pair subfolders.
Usage
Install dependencies:
pip install ctranslate2 transformers huggingface_hub sentencepiece
Download a single pair and translate with ctranslate2.Translator + transformers.MarianTokenizer:
from huggingface_hub import snapshot_download
import ctranslate2
from transformers import MarianTokenizer
pair = "francais-ewondo" # French -> Ewondo
# Download just the one pair subfolder
local_dir = snapshot_download(
repo_id="flagship-ai/cameroon-int8",
allow_patterns=[f"{pair}/*"],
)
model_path = f"{local_dir}/{pair}"
tokenizer = MarianTokenizer.from_pretrained(model_path)
translator = ctranslate2.Translator(model_path, device="cpu") # or device="cuda"
text = "Bonjour, comment allez-vous ?"
source = tokenizer.convert_ids_to_tokens(tokenizer.encode(text))
results = translator.translate_batch([source])
target = results[0].hypotheses[0]
output = tokenizer.decode(
tokenizer.convert_tokens_to_ids(target),
skip_special_tokens=True,
)
print(output)
Languages
The bundle covers directions to and from French for languages including: Aghem, Awing, Babanki, Bafia, Bakoko, Bakweri, Bidwee, Bulu, Bum, Cuvok, Denya, Dii, Doyayo, Ejagham, English, Esimbi, Ewondo, Fufulde, Gbaya, Ghomala, Guidar, Guiziga, Isu, Kapsiki, Kenyang, Koonzime, Lamnso, Limbum, Mankon, Massana, Mbembe, Medumba, Meta, Mmen, Mofa, Mofu, Moghamo, Mpumpong, Mundani, Ngi, Ngienboum, Ngomba, Ngombale, Ngwo, Nomaande, Nugunu, Oku, Pana, Peere, Pinyin, Punu, Samba, Tunen, Tupuri, Vute, Weh, Yambeta, Yemba, and more.
Links
- Blog: https://lingo.cm/blog
License
Released under CC BY-NC 4.0. Intended for research and non-commercial use supporting Cameroonian language technology.