|
--- |
|
license: apache-2.0 |
|
tags: |
|
- |
|
datasets: |
|
- EXIST Dataset |
|
- MeTwo Machismo and Sexism Twitter Identification dataset |
|
|
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: twitter_sexismo-finetuned-exist2021 |
|
results: |
|
- task: |
|
name: Text Classification |
|
type: text-classification |
|
dataset: |
|
name: EXIST Dataset |
|
type: EXIST Dataset |
|
args: es |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 0.83 |
|
--- |
|
|
|
# twitter_sexismo-finetuned-exist2021 |
|
|
|
This model is a fine-tuned version of [pysentimiento/robertuito-hate-speech](https://huggingface.co/pysentimiento/robertuito-hate-speech) on the EXIST dataset and MeTwo: Machismo and Sexism Twitter Identification dataset https://github.com/franciscorodriguez92/MeTwo. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.54 |
|
- Accuracy: 0.83 |
|
|
|
## Model description |
|
Model for the 'Somos NLP' Hackathon for detecting sexism in twitters in Spanish. Created by: |
|
- **medardodt** |
|
- **MariaIsabel** |
|
- **ManRo** |
|
- **lucel172** |
|
- **robertou2** |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- my_learning_rate = 5E-5 |
|
- my_adam_epsilon = 1E-8 |
|
- my_number_of_epochs = 8 |
|
- my_warmup = 3 |
|
- my_mini_batch_size = 32 |
|
- optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 8 |
|
|
|
### Training results |
|
|
|
|Epoch|Training Loss|Validation Loss|Accuracy|F1|Precision|Precision| |
|
|----|-------|-------|-------|-------|-------|-------| |
|
|1|0.389900 |0.397857 |0.827133 |0.699620 |0.786325 |0.630137 | |
|
|2|0.064400 |0.544625 |0.831510 |0.707224 |0.794872 |0.636986 | |
|
|3|0.004800 |0.837723 |0.818381 |0.704626 |0.733333 |0.678082 | |
|
|4|0.000500 |1.045066 |0.820569 | 0.702899 |0.746154 |0.664384 | |
|
|5|0.000200 |1.172727 |0.805252 |0.669145 |0.731707 |0.616438 | |
|
|6|0.000200 |1.202422 |0.827133 |0.720848 |0.744526 |0.698630 | |
|
|7|0.000000 |1.195012 |0.827133 |0.718861 |0.748148 |0.691781 | |
|
|8|0.000100 |1.215515 |0.824945 |0.705882 |0.761905 |0.657534 | |
|
|9|0.000100|1.233099 |0.827133 |0.710623 |0.763780 |0.664384 | |
|
|10|0.000100|1.237268 |0.829322 |0.713235 |0.769841 |0.664384 | |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.17.0 |
|
- Pytorch 1.10.0+cu111 |
|
- Tokenizers 0.11.6 |
|
|
|
|
|
## Model in Action |
|
Fast usage with pipelines: |
|
``` python |
|
###libraries required |
|
!pip install transformers |
|
from transformers import pipeline |
|
|
|
### usage pipelines |
|
model_checkpoint = "hackathon-pln-es/twitter_sexismo-finetuned-exist2021-metwo" |
|
pipeline_nlp = pipeline("text-classification", model=model_checkpoint) |
|
pipeline_nlp("mujer al volante peligro!") |
|
#pipeline_nlp("¡me encanta el ipad!") |
|
#pipeline_nlp (["mujer al volante peligro!", "Los hombre tienen más manias que las mujeres", "me encanta el ipad!"] ) |
|
|
|
# OUTPUT MODEL # |
|
# LABEL_0: "NON SEXISM"or LABEL_1: "SEXISM" and score: probability of accuracy per model. |
|
|
|
# [{'label': 'LABEL_1', 'score': 0.9967633485794067}] |
|
# [{'label': 'LABEL_0', 'score': 0.9934417009353638}] |
|
|
|
#[{‘label': 'LABEL_1', 'score': 0.9967633485794067}, |
|
# {'label': 'LABEL_1', 'score': 0.9755664467811584}, |
|
# {'label': 'LABEL_0', 'score': 0.9955045580863953}] |
|
``` |