JonatanGk's picture
Update labels
ac71aac
metadata
language: es
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: roberta-base-bne-finetuned-ciberbullying-spanish
    results:
      - task:
          name: Text Classification
          type: text-classification
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.9607097303206997

roberta-base-bne-finetuned-ciberbullying-spanish

This model is a fine-tuned version of BSC-TeMU/roberta-base-bne on the dataset generated scrapping all social networks (Twitter, Youtube ...) to detect ciberbullying on Spanish.

It achieves the following results on the evaluation set:

  • Loss: 0.1657
  • Accuracy: 0.9607

Training and evaluation data

We use the concatenation from multiple datasets generated scrapping social networks (Twitter,Youtube,Discord...) to fine-tune this model. The total number of sentence pairs is above 360k sentences.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 4

Training results

Training Loss Epoch Step Accuracy Validation Loss
0.1512 1.0 22227 0.9501 0.1418
0.1253 2.0 44454 0.9567 0.1499
0.0973 3.0 66681 0.9594 0.1397
0.0658 4.0 88908 0.9607 0.1657

Framework versions

  • Transformers 4.10.3
  • Pytorch 1.9.0+cu102
  • Datasets 1.12.1
  • Tokenizers 0.10.3