Edit model card

roberta-tagalog-base-philippine-elections-2016-2022-hate-speech

This model is a fine-tuned version of jcblaise/roberta-tagalog-base for the task of Text Classification, classifying hate and non-hate tweets.

The model was fine-tuned on a combined dataset mapsoriano/2016_2022_hate_speech_filipino consisting of the hate_speech_filipino dataset and a newly crawled 2022 Philippine Presidential Elections-related Tweets Hate Speech Dataset.

It achieves the following results on the evaluation (validation) set:

  • Loss: 0.3574
  • Accuracy: 0.8743

It achieves the following results on the test set:

  • Accuracy: 0.8783
  • Precision: 0.8563
  • Recall: 0.9077
  • F1: 0.8813

Feel free to connect via LinkedIn for further information on this model or on the study that it was used on.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.3423 1.0 1361 0.3167 0.8693
0.2194 2.0 2722 0.3574 0.8743

Framework versions

  • Transformers 4.33.2
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.5
  • Tokenizers 0.13.3

Citation Information

Research Title: Application of BERT in Detecting Online Hate

Published: 2023

Authors:

  • Castro, D.
  • Dizon, L. J.
  • Sarip, A. J.
  • Soriano, M. A.

Feel free to connect via LinkedIn for further information on this model or on the study that it was used on.

Downloads last month
38
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Datasets used to train mapsoriano/roberta-tagalog-base-philippine-elections-2016-2022-hate-speech