JuaKazi Multilingual Bias Corrector v1
Seq2seq gender bias correction model covering 6 languages.
Fine-tuned from castorini/afriteva_v2_base on ~10K correction pairs.
Usage
Input format: correct bias {lang}: {biased sentence}
Where lang is one of: sw, ha, zu, ki, fr, en
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("juakazike/multilingual-bias-corrector-v1")
model = AutoModelForSeq2SeqLM.from_pretrained("juakazike/multilingual-bias-corrector-v1")
def correct(text, lang):
prompt = f"correct bias {lang}: {text}"
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=128)
out = model.generate(**inputs, max_new_tokens=128, num_beams=4)
return tokenizer.decode(out[0], skip_special_tokens=True)
correct("The chairman will lead the board meeting.", "en")
# -> "The chair will lead the board meeting."
Validation BLEU (val set, 10% held out per language)
| Language | Pairs | BLEU |
|---|---|---|
| Swahili (sw) | 1,586 | 17.7 |
| Hausa (ha) | 1,917 | 4.1 |
| Zulu (zu) | 1,931 | 0.6 |
| Gikuyu (ki) | 867 | 4.0 |
| French (fr) | 636 | 30.8 |
| English (en) | 3,464 | 38.6 |
- Downloads last month
- 27