Edit model card

NaijaXLM-T-base Hate

This is a NaijaXLM-T base model finetuned on Nigerian tweets annotated for hate speech detection. The model is described and evaluated in the reference paper and was developed together with @pvcastro.

Model Details

Model Description

  • Model type: xlm-roberta
  • Language(s) (NLP): (Nigerian) English, Nigerian Pidgin, Hausa, Yoruba, Igbo
  • Finetuned from model: manueltonneau/naija-xlm-twitter-base

Model Sources [optional]

Training Details

Training Data

This model was finetuned on the stratified (dataset=='stratified') and active learning (dataset=='al') subset of NaijaHate.

Training Procedure and Evaluation

We perform a 90-10 train-test split and conduct a 5-fold cross-validation with 5 learning rates ranging from 1e-5 to 5e-5. Each fold is trained using 3 different seeds. The train-test split is repeated for 10 different seeds, and the evaluation metrics are averaged across the 10 seeds.

We evaluate model performance on three datasets: the holdout sample from the train-test splits as well as the top-scored sample (dataset=='eval') and the random sample (dataset=='random') from NaijaHate.

Model Holdout Top-scored Random
GPT-3.5, ZSL - 60.3±2.7 3.1±1.2
Perspective API - 60.2±3.5 4.3±2.6
XLM-T 84.2 ± 0.6 51.8 ± 0.7 0.6 ± 0.1
XLM-T 62.0 ± 2.3 68.9 ± 0.8 3.3 ± 0.6
XLM-T 70.5 ± 3.7 63.7 ± 1.1 1.9 ± 0.5
DeBERTaV3 82.3 ± 2.3 85.3 ± 0.8 29.7 ± 4.1
XLM-R 76.7 ± 2.5 83.6 ± 0.8 22.1 ± 3.7
mDeBERTaV3 29.2 ± 2.0 49.6 ± 1.0 0.2 ± 0.0
Conv. BERT 79.2 ± 2.3 86.2 ± 0.8 22.6 ± 3.6
BERTweet 83.6 ± 2.0 88.5 ± 0.6 34.0 ± 4.4
XLM-T 79.0 ± 2.4 84.5 ± 0.9 22.5 ± 3.7
AfriBERTa 70.1 ± 2.7 80.1 ± 0.9 12.5 ± 2.8
AfroXLM-R 79.7 ± 2.3 86.1 ± 0.8 24.7 ± 4.0
XLM-R Naija 77.0 ± 2.5 83.5 ± 0.8 19.1 ± 3.4
NaijaXLM-T 83.4 ± 2.1 89.3 ± 0.7 33.7 ± 4.5

For more information on the evaluation, please read the reference paper.

BibTeX entry and citation information

Please cite the reference paper if you use this model.

@article{tonneau2024naijahate,
  title={NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data},
  author={Tonneau, Manuel and de Castro, Pedro Vitor Quinta and Lasri, Karim and Farouq, Ibrahim and Subramanian, Lakshminarayanan and Orozco-Olvera, Victor and Fraiberger, Samuel},
  journal={arXiv preprint arXiv:2403.19260},
  year={2024}
}
Downloads last month
4

Dataset used to train manueltonneau/naija-xlm-twitter-base-hate