metadata

language: es
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: roberta-base-bne-finetuned-ciberbullying-spanish
    results:
      - task:
          name: Text Classification
          type: text-classification
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.9607097303206997

roberta-base-bne-finetuned-ciberbullying-spanish

This model is a fine-tuned version of BSC-TeMU/roberta-base-bne on the dataset generated scrapping all social networks (Twitter, Youtube ...) to detect ciberbullying on Spanish.

It achieves the following results on the evaluation set:

Loss: 0.1657
Accuracy: 0.9607

Training and evaluation data

We use the concatenation from multiple datasets generated scrapping social networks (Twitter,Youtube,Discord...) to fine-tune this model. The total number of sentence pairs is above 360k sentences.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 4

Training results

Training Loss	Epoch	Step	Accuracy	Validation Loss
0.1512	1.0	22227	0.9501	0.1418
0.1253	2.0	44454	0.9567	0.1499
0.0973	3.0	66681	0.9594	0.1397
0.0658	4.0	88908	0.9607	0.1657

Framework versions

Transformers 4.10.3
Pytorch 1.9.0+cu102
Datasets 1.12.1
Tokenizers 0.10.3