File size: 5,267 Bytes
c009c31
 
f39a039
 
c009c31
a4b4249
 
 
 
 
 
c009c31
 
 
 
a4b4249
 
 
c009c31
a4b4249
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c009c31
 
 
 
 
a4b4249
c009c31
 
 
ac74139
 
 
 
 
 
 
c009c31
8ca82b5
 
 
 
 
 
c009c31
 
a4b4249
 
 
 
c009c31
 
 
985c509
c009c31
 
 
a4b4249
 
 
c009c31
 
 
 
 
 
ac74139
c009c31
 
 
 
 
ac74139
a4b4249
c009c31
 
 
 
 
ac74139
 
 
a4b4249
ac74139
 
 
c009c31
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
base_model: readerbench/RoBERT-base
language:
  - ro
tags:
- hate speech
- offensive language
- romanian
- classification
- nlp
- bert
metrics:
- accuracy
- precision
- recall
- f1_macro
- f1_micro
- f1_weighted
model-index:
- name: ro-offense
  results: 
  - task:
      type: text-classification             # Required. Example: automatic-speech-recognition
      name: Text Classification             # Optional. Example: Speech Recognition
    dataset:
      type: readerbench/ro-offense          # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
      name: Rommanian Offensive Language Dataset          # Required. A pretty name for the dataset. Example: Common Voice (French)
      config: default      # Optional. The name of the dataset configuration used in `load_dataset()`. Example: fr in `load_dataset("common_voice", "fr")`. See the `datasets` docs for more info: https://huggingface.co/docs/datasets/package_reference/loading_methods#datasets.load_dataset.name
      split: test        # Optional. Example: test
    metrics:
      - type: accuracy         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.8190       # Required. Example: 20.90
        name: Accuracy         # Optional. Example: Test WER
      - type: precision         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.8138       # Required. Example: 20.90
        name: Precision         # Optional. Example: Test WER
      - type: recall         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.8118       # Required. Example: 20.90
        name: Recall         # Optional. Example: Test WER
      - type: f1_weighted         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.8189       # Required. Example: 20.90
        name: Weighted F1         # Optional. Example: Test WER
      - type: f1_micro        # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.8190       # Required. Example: 20.90
        name: Macro F1         # Optional. Example: Test WER
      - type: f1_macro         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.8126       # Required. Example: 20.90
        name: Macro F1         # Optional. Example: Test WER
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# RO-Offense

This model is a fine-tuned version of [readerbench/RoBERT-base](https://huggingface.co/readerbench/RoBERT-base) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.8411
- Accuracy: 0.8232
- Precision: 0.8235
- Recall: 0.8210
- F1 Macro: 0.8207
- F1 Micro: 0.8232
- F1 Weighted: 0.8210

Output labels:
- LABEL_0 = No offensive language 
- LABEL_1 = Profanity (no directed insults)
- LABEL_2 = Insults (directed offensive language, lower level of offensiveness)
- LABEL_3 = Abuse (directed hate speech, racial slurs, sexist speech, threat with violence, death wishes, ..)

## Model description

Finetuned Romanian BERT model for offensive classification.

Trained on the [RO-Offense](https://huggingface.co/datasets/readerbench/ro-offense) Dataset 


## Intended uses & limitations

Offensive and Hate speech detection for Romanian Language

## Training and evaluation data

Trained on the train split of [RO-Offense](https://huggingface.co/datasets/readerbench/ro-offense) Dataset

Evaluated on the test split of [RO-Offense](https://huggingface.co/datasets/readerbench/ro-offense) Dataset 

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 64
- eval_batch_size: 128
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 10 (Early stop epoch 7, best epoch 4)

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 Macro | F1 Micro | F1 Weighted |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:--------:|:--------:|:-----------:|
| No log        | 1.0   | 125  | 0.7789          | 0.7037   | 0.6825    | 0.7000 | 0.6873   | 0.7037   | 0.7132      |
| No log        | 2.0   | 250  | 0.5170          | 0.8006   | 0.8066    | 0.8016 | 0.7986   | 0.8006   | 0.7971      |
| No log        | 3.0   | 375  | 0.5139          | 0.8096   | 0.8168    | 0.8237 | 0.8120   | 0.8096   | 0.8047      |
| 0.6074        | **4.0**   | 500  | 0.6180          | 0.8247   | 0.8251    | 0.8187 | 0.8210   | 0.8247   | **0.8233**      |
| 0.6074        | 5.0   | 625  | 0.7311          | 0.8096   | 0.8071    | 0.8085 | 0.8064   | 0.8096   | 0.8071      |
| 0.6074        | 6.0   | 750  | 0.8365          | 0.8101   | 0.8117    | 0.8191 | 0.8105   | 0.8101   | 0.8051      |
| 0.6074        | 7.0   | 875  | 0.8411          | 0.8232   | 0.8235    | 0.8210 | 0.8207   | 0.8232   | 0.8210      |


### Framework versions

- Transformers 4.31.0
- Pytorch 2.0.1+cu118
- Datasets 2.14.3
- Tokenizers 0.13.3