This model is used to detect Offensive Content in Kannada Code-Mixed language. The mono in the name refers to the monolingual setting, where the model is trained using only Kannada(pure and code-mixed) data. The weights are initialized from pretrained XLM-Roberta-Base and pretrained using Masked Language Modelling on the target dataset before fine-tuning using Cross-Entropy Loss.

This model is the best of multiple trained for EACL 2021 Shared Task on Offensive Language Identification in Dravidian Languages. Genetic-Algorithm based ensembled test predictions got the second-highest weighted F1 score at the leaderboard (Weighted F1 score on hold out test set: This model - 0.73, Ensemble - 0.74)

For more details about our paper

Debjoy Saha, Naman Paharia, Debajit Chakraborty, Punyajoy Saha, Animesh Mukherjee. "Hate-Alert@DravidianLangTech-EACL2021: Ensembling strategies for Transformer-based Offensive language Detection".

Please cite our paper in any published work that uses any of these resources.

