Offensive language detection

Tasks

The model combines three classifiers for all three tasks of the OLID dataset [1].

subtask a: OFF, NOT
subtask b: TIN, UNT
subtask c: IND, GRP, OTH

Trained with Flair NLP as a multi-task model.

Training data: Offensive Language Identification Dataset (OLID) V1.0 [1] Test data: test set from Semi-Supervised Dataset for Offensive Language Identification (SOLID) [2]

Citation

When using this model, please cite:

Gregor Wiedemann, Seid Muhie Yimam, and Chris Biemann. 2020. UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1638–1644, Barcelona (online). International Committee for Computational Linguistics.

Evaluation scores

Evaluation was conducted on the SemEval 2020 Task 12 English test set. Thus, results can be compared to [3]

Task A

Results:
- F-score (micro) 0.9256
- F-score (macro) 0.9131
- Accuracy 0.9256

By class:
              precision    recall  f1-score   support

         NOT     0.9922    0.9042    0.9461      2807
         OFF     0.7976    0.9815    0.8800      1080

    accuracy                         0.9256      3887
   macro avg     0.8949    0.9428    0.9131      3887
weighted avg     0.9381    0.9256    0.9278      3887

Task B

Results:
- F-score (micro) 0.7138
- F-score (macro) 0.6408
- Accuracy 0.7138

By class:
              precision    recall  f1-score   support

         TIN     0.6826    0.9741    0.8027       850
         UNT     0.8947    0.3269    0.4789       572

    accuracy                         0.7138      1422
   macro avg     0.7887    0.6505    0.6408      1422
weighted avg     0.7679    0.7138    0.6724      1422

Task C

Results:
- F-score (micro) 0.8318
- F-score (macro) 0.6978
- Accuracy 0.8318

By class:
              precision    recall  f1-score   support

         IND     0.8703    0.9483    0.9076       580
         GRP     0.7216    0.6684    0.6940       190
         OTH     0.7143    0.3750    0.4918        80

    accuracy                         0.8318       850
   macro avg     0.7687    0.6639    0.6978       850
weighted avg     0.8223    0.8318    0.8207       850

References

[1] Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. Predicting the Type and Target of Offensive Posts in Social Media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1415–1420, Minneapolis, Minnesota. Association for Computational Linguistics.

[2] Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, and Preslav Nakov. 2021. SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 915–928, Online. Association for Computational Linguistics.

[3] Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, and Çağrı Çöltekin. 2020. SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020). In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1425–1447, Barcelona (online). International Committee for Computational Linguistics.