HIKAF Hausa mT5-small β€” Hausa Language Generation Model

This model is a fine-tuned version of google/mt5-small on the msmaje/filtered_expanded_hausa_corpus dataset (590K rows of filtered, quality-scored Hausa text from 15 sources).

It serves as the generator component of HIKAF β€” the Hausa Inclusive Knowledge Access Framework β€” a Retrieval-Augmented Generation (RAG) system for inclusive knowledge access in the Hausa language, presented at RISE 2026 (Harnessing Digital Innovation and Emerging Technologies for Inclusive Growth and Economic Renaissance).

Task

Given a Hausa context passage (retrieved from a knowledge base), the model generates a fluent, grounded Hausa continuation/response, completing the RAG pipeline.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("msmaje/hikaf-hausa-mt5-small")
model     = AutoModelForSeq2SeqLM.from_pretrained("msmaje/hikaf-hausa-mt5-small")

context = "ci gaba da rubutun Hausa: Wurin Shakatawa na Yankari babban wurin shakatawa ne..."
inputs  = tokenizer(context, return_tensors="pt", max_length=128, truncation=True)

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    num_beams=4,
    early_stopping=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Parameter Value
Base model google/mt5-small
Dataset msmaje/filtered_expanded_hausa_corpus
Training samples 80,000
Quality threshold >= 50
Epochs 4
Batch size (effective) 32
Learning rate 0.0003
Input max length 128 tokens
Target max length 128 tokens
Eval ROUGE-L {eval_metrics.get('eval_rougeL', 'N/A')}

HIKAF RAG Pipeline

User Query (Hausa)
       β”‚
       β–Ό
 LaBSE Encoder ──→ FAISS Index (590K Hausa passages)
       β”‚                β”‚
       β”‚         Top-5 Retrieved Passages
       β”‚                β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
              Context Augmentation
                        β”‚
                        β–Ό
          hikaf-hausa-mt5-small (this model)
                        β”‚
                        β–Ό
          Hausa Response (grounded, verified)

Citation

@inproceedings{maje2026hikaf,
  title     = {Harnessing Retrieval-Augmented Generation (RAG) Systems for Inclusive
               Knowledge Access in the Hausa Language},
  author    = {Maje, Musa},
  booktitle = {Proceedings of RISE 2026},
  year      = {2026},
}
Downloads last month
19
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for msmaje/hikaf-hausa-mt5-small

Base model

google/mt5-small
Finetuned
(714)
this model

Dataset used to train msmaje/hikaf-hausa-mt5-small