ro-offense / README.md
andyP's picture
Update README.md
8ca82b5
metadata
base_model: readerbench/RoBERT-base
language:
  - ro
tags:
  - hate speech
  - offensive language
  - romanian
  - classification
  - nlp
  - bert
metrics:
  - accuracy
  - precision
  - recall
  - f1_macro
  - f1_micro
  - f1_weighted
model-index:
  - name: ro-offense
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          type: readerbench/ro-offense
          name: Rommanian Offensive Language Dataset
          config: default
          split: test
        metrics:
          - type: accuracy
            value: 0.819
            name: Accuracy
          - type: precision
            value: 0.8138
            name: Precision
          - type: recall
            value: 0.8118
            name: Recall
          - type: f1_weighted
            value: 0.8189
            name: Weighted F1
          - type: f1_micro
            value: 0.819
            name: Macro F1
          - type: f1_macro
            value: 0.8126
            name: Macro F1

RO-Offense

This model is a fine-tuned version of readerbench/RoBERT-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8411
  • Accuracy: 0.8232
  • Precision: 0.8235
  • Recall: 0.8210
  • F1 Macro: 0.8207
  • F1 Micro: 0.8232
  • F1 Weighted: 0.8210

Output labels:

  • LABEL_0 = No offensive language
  • LABEL_1 = Profanity (no directed insults)
  • LABEL_2 = Insults (directed offensive language, lower level of offensiveness)
  • LABEL_3 = Abuse (directed hate speech, racial slurs, sexist speech, threat with violence, death wishes, ..)

Model description

Finetuned Romanian BERT model for offensive classification.

Trained on the RO-Offense Dataset

Intended uses & limitations

Offensive and Hate speech detection for Romanian Language

Training and evaluation data

Trained on the train split of RO-Offense Dataset

Evaluated on the test split of RO-Offense Dataset

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-05
  • train_batch_size: 64
  • eval_batch_size: 128
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 10 (Early stop epoch 7, best epoch 4)

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1 Macro F1 Micro F1 Weighted
No log 1.0 125 0.7789 0.7037 0.6825 0.7000 0.6873 0.7037 0.7132
No log 2.0 250 0.5170 0.8006 0.8066 0.8016 0.7986 0.8006 0.7971
No log 3.0 375 0.5139 0.8096 0.8168 0.8237 0.8120 0.8096 0.8047
0.6074 4.0 500 0.6180 0.8247 0.8251 0.8187 0.8210 0.8247 0.8233
0.6074 5.0 625 0.7311 0.8096 0.8071 0.8085 0.8064 0.8096 0.8071
0.6074 6.0 750 0.8365 0.8101 0.8117 0.8191 0.8105 0.8101 0.8051
0.6074 7.0 875 0.8411 0.8232 0.8235 0.8210 0.8207 0.8232 0.8210

Framework versions

  • Transformers 4.31.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.3
  • Tokenizers 0.13.3