metadata
license: apache-2.0
tags:
- null
datasets:
- EXIST Dataset
- MeTwo Machismo and Sexism Twitter Identification dataset
widget:
- text: manejas muy bien para ser mujer
- text: En temas políticos hombres y mujeres son iguales
- text: Los ipad son unos equipos electrónicos
metrics:
- accuracy
model-index:
- name: twitter_sexismo-finetuned-exist2021
results:
- task:
name: Text Classification
type: text-classification
dataset:
name: EXIST Dataset
type: EXIST Dataset
args: es
metrics:
- name: Accuracy
type: accuracy
value: 0.83
twitter_sexismo-finetuned-exist2021
This model is a fine-tuned version of pysentimiento/robertuito-hate-speech on the EXIST dataset and MeTwo: Machismo and Sexism Twitter Identification dataset https://github.com/franciscorodriguez92/MeTwo. It achieves the following results on the evaluation set:
- Loss: 0.54
- Accuracy: 0.83
Model description
Model for the 'Somos NLP' Hackathon for detecting sexism in twitters in Spanish. Created by:
- medardodt
- MariaIsabel
- ManRo
- lucel172
- robertou2
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- my_learning_rate = 5E-5
- my_adam_epsilon = 1E-8
- my_number_of_epochs = 8
- my_warmup = 3
- my_mini_batch_size = 32
- optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Epoch | Training Loss | Validation Loss | Accuracy | F1 | Precision | Precision |
---|---|---|---|---|---|---|
1 | 0.389900 | 0.397857 | 0.827133 | 0.699620 | 0.786325 | 0.630137 |
2 | 0.064400 | 0.544625 | 0.831510 | 0.707224 | 0.794872 | 0.636986 |
3 | 0.004800 | 0.837723 | 0.818381 | 0.704626 | 0.733333 | 0.678082 |
4 | 0.000500 | 1.045066 | 0.820569 | 0.702899 | 0.746154 | 0.664384 |
5 | 0.000200 | 1.172727 | 0.805252 | 0.669145 | 0.731707 | 0.616438 |
6 | 0.000200 | 1.202422 | 0.827133 | 0.720848 | 0.744526 | 0.698630 |
7 | 0.000000 | 1.195012 | 0.827133 | 0.718861 | 0.748148 | 0.691781 |
8 | 0.000100 | 1.215515 | 0.824945 | 0.705882 | 0.761905 | 0.657534 |
9 | 0.000100 | 1.233099 | 0.827133 | 0.710623 | 0.763780 | 0.664384 |
10 | 0.000100 | 1.237268 | 0.829322 | 0.713235 | 0.769841 | 0.664384 |
Framework versions
- Transformers 4.17.0
- Pytorch 1.10.0+cu111
- Tokenizers 0.11.6
Model in Action
Fast usage with pipelines:
###libraries required
!pip install transformers
from transformers import pipeline
### usage pipelines
model_checkpoint = "hackathon-pln-es/twitter_sexismo-finetuned-exist2021-metwo"
pipeline_nlp = pipeline("text-classification", model=model_checkpoint)
pipeline_nlp("mujer al volante peligro!")
#pipeline_nlp("¡me encanta el ipad!")
#pipeline_nlp (["mujer al volante peligro!", "Los hombre tienen más manias que las mujeres", "me encanta el ipad!"] )
# OUTPUT MODEL #
# LABEL_0: "NON SEXISM"or LABEL_1: "SEXISM" and score: probability of accuracy per model.
# [{'label': 'LABEL_1', 'score': 0.9967633485794067}]
# [{'label': 'LABEL_0', 'score': 0.9934417009353638}]
#[{‘label': 'LABEL_1', 'score': 0.9967633485794067},
# {'label': 'LABEL_1', 'score': 0.9755664467811584},
# {'label': 'LABEL_0', 'score': 0.9955045580863953}]