|
This model was released with the following paper: |
|
``` |
|
@proceedings{feedbackloop, |
|
title = "Feedback Loops and Complex Dynamics of Harmful Speech in Online Discussions", |
|
author = {Rong-Ching Chang, Jonathan May, and Kristina Lerman}, |
|
publisher = {Proceedings of the 16th International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation.} |
|
venue = {Pittsburgh, PA}, |
|
month = sep, |
|
year = {2023} |
|
} |
|
``` |
|
|
|
We combined several multilingual ground truth datasets for misogyny and sexism (M/S) versus non-misogyny and non-sexism (non-M/S) [3,5,8,9,11,13, 20]. Specifically, the dataset expressing misogynistic or sexist speech (M/S) and the same number of texts expressing non-M/S speech in each language included 8, 582 English-language texts, 872 in French, 561 in Hindi, 2, 190 in Italian, and 612 in Bengali. The test data was a balanced set of 100 texts sampled randomly from both M/S and non-M/S groups in each language, for a total of 500 examples of M/S speech and 500 examples of non-M/S speech. |
|
|
|
References of the datasets are: |
|
|
|
3. Bhattacharya, S., et al.: Developing a multilingual annotated corpus of misog- yny and aggression, pp. 158β168. ELRA, Marseille, France, May 2020. https:// aclanthology.org/2020.trac- 1.25 |
|
|
|
5. Chiril, P., Moriceau, V., Benamara, F., Mari, A., Origgi, G., Coulomb-Gully, M.: An annotated corpus for sexism detection in French tweets. In: Proceedings of LREC, pp. 1397β1403 (2020) |
|
|
|
8. Fersini, E., et al.: SemEval-2022 task 5: multimedia automatic misogyny identification. In: Proceedings of SemEval, pp. 533β549 (2022) |
|
|
|
9. Fersini, E., Nozza, D., Rosso, P.: Overview of the Evalita 2018 task on automatic misogyny identification (AMI). EVALITA Eval. NLP Speech Tools Italian 12, 59 (2018) |
|
|
|
11. Guest, E., Vidgen, B., Mittos, A., Sastry, N., Tyson, G., Margetts, H.: An expert annotated dataset for the detection of online misogyny. In: Proceedings of EACL, pp. 1336β1350 (2021) |
|
|
|
13. Jha, A., Mamidi, R.: When does a compliment become sexist? Analysis and classification of ambivalent sexism using Twitter data. In: Proceedings of NLP+CSS, pp. 7β16 (2017) |
|
|
|
20. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of NAACL SRW, pp. 88β93 (2016) |
|
|
|
|
|
Please see the paper for more detail. |
|
|
|
--- |
|
license: mit |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- precision |
|
- recall |
|
model-index: |
|
- name: xlm-roberta-base-misogyny-sexism-indomain-mix-bal |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# xlm-roberta-base-misogyny-sexism-indomain-mix-bal |
|
|
|
This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the None dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.8259 |
|
- Accuracy: 0.826 |
|
- F1: 0.8333 |
|
- Precision: 0.7996 |
|
- Recall: 0.87 |
|
- Mae: 0.174 |
|
- Tn: 391 |
|
- Fp: 109 |
|
- Fn: 65 |
|
- Tp: 435 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 2e-05 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 16 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 2 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall | Mae | Tn | Fp | Fn | Tp | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|:-----:|:---:|:---:|:--:|:---:| |
|
| 0.2643 | 1.0 | 1603 | 0.6511 | 0.82 | 0.8269 | 0.7963 | 0.86 | 0.18 | 390 | 110 | 70 | 430 | |
|
| 0.2004 | 2.0 | 3206 | 0.8259 | 0.826 | 0.8333 | 0.7996 | 0.87 | 0.174 | 391 | 109 | 65 | 435 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.20.1 |
|
- Pytorch 1.12.0+cu102 |
|
- Datasets 2.3.2 |
|
- Tokenizers 0.12.1 |
|
# Multilingual_Misogyny_Detection |
|
|