dougtrajano commited on
Commit
8c03084
1 Parent(s): bfde0c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -31
README.md CHANGED
@@ -16,38 +16,80 @@ metrics:
16
  model-index:
17
  - name: dougtrajano/toxicity-type-detection
18
  results: []
 
 
 
19
  ---
20
 
21
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
22
- should probably proofread and complete it, then remove this comment. -->
23
-
24
  # dougtrajano/toxicity-type-detection
25
 
26
- This model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the OLID-BR dataset.
27
- It achieves the following results on the evaluation set:
28
- - Loss: 2.2337
29
- - Accuracy: 0.4214
30
- - F1: 0.7645
31
- - Precision: 0.8180
32
- - Recall: 0.7230
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- ## Model description
35
 
36
- More information needed
37
 
38
- ## Intended uses & limitations
39
 
40
- More information needed
41
 
42
- ## Training and evaluation data
43
 
44
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## Training procedure
47
 
48
  ### Training hyperparameters
49
 
50
  The following hyperparameters were used during training:
 
51
  - learning_rate: 7.044186985160909e-05
52
  - train_batch_size: 8
53
  - eval_batch_size: 8
@@ -56,24 +98,13 @@ The following hyperparameters were used during training:
56
  - lr_scheduler_type: linear
57
  - num_epochs: 30
58
 
59
- ### Training results
60
-
61
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
62
- |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
63
- | 1.1107 | 1.0 | 534 | 0.9282 | 0.2823 | 0.6762 | 0.7419 | 0.6630 |
64
- | 0.8974 | 2.0 | 1068 | 0.8605 | 0.2754 | 0.6324 | 0.7759 | 0.5913 |
65
- | 0.7436 | 3.0 | 1602 | 1.0151 | 0.3150 | 0.6870 | 0.7828 | 0.6512 |
66
- | 0.644 | 4.0 | 2136 | 1.1455 | 0.3519 | 0.7114 | 0.7857 | 0.6865 |
67
- | 0.4704 | 5.0 | 2670 | 1.4827 | 0.3387 | 0.7109 | 0.7814 | 0.6843 |
68
- | 0.3316 | 6.0 | 3204 | 1.6275 | 0.3602 | 0.7217 | 0.8020 | 0.6816 |
69
- | 0.2717 | 7.0 | 3738 | 2.2337 | 0.4214 | 0.7645 | 0.8180 | 0.7230 |
70
- | 0.231 | 8.0 | 4272 | 2.0275 | 0.3651 | 0.7194 | 0.8271 | 0.6528 |
71
- | 0.197 | 9.0 | 4806 | 1.9878 | 0.4033 | 0.7409 | 0.8240 | 0.6812 |
72
-
73
-
74
  ### Framework versions
75
 
76
  - Transformers 4.26.0
77
  - Pytorch 1.10.2+cu113
78
  - Datasets 2.9.0
79
  - Tokenizers 0.13.2
 
 
 
 
 
16
  model-index:
17
  - name: dougtrajano/toxicity-type-detection
18
  results: []
19
+ datasets:
20
+ - dougtrajano/olid-br
21
+ library_name: transformers
22
  ---
23
 
 
 
 
24
  # dougtrajano/toxicity-type-detection
25
 
26
+ Toxicity Type Detection is a model that predicts the type(s) of toxicity(s) in a given text.
27
+
28
+ Toxicity Labels: `health`, `ideology`, `insult`, `lgbtqphobia`, `other_lifestyle`, `physical_aspects`, `profanity_obscene`, `racism`, `sexism`, `xenophobia`
29
+
30
+ This BERT model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the [OLID-BR dataset](https://huggingface.co/datasets/dougtrajano/olid-br).
31
+
32
+ ## Overview
33
+
34
+ **Input:** Text in Brazilian Portuguese
35
+
36
+ **Output:** Multilabel classification (toxicity types)
37
+
38
+ ## Usage
39
+
40
+ ```python
41
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
42
+
43
+ tokenizer = AutoTokenizer.from_pretrained("dougtrajano/toxicity-type-detection")
44
+
45
+ model = AutoModelForSequenceClassification.from_pretrained("dougtrajano/toxicity-type-detection")
46
+ ```
47
+
48
+ ## Limitations and bias
49
+
50
+ The following factors may degrade the model’s performance.
51
 
52
+ **Text Language**: The model was trained on Brazilian Portuguese texts, so it may not work well with Portuguese dialects.
53
 
54
+ **Text Origin**: The model was trained on texts from social media and a few texts from other sources, so it may not work well on other types of texts.
55
 
56
+ ## Trade-offs
57
 
58
+ Sometimes models exhibit performance issues under particular circumstances. In this section, we'll discuss situations in which you might discover that the model performs less than optimally, and should plan accordingly.
59
 
60
+ **Text Length**: The model was fine-tuned on texts with a word count between 1 and 178 words (average of 18 words). It may give poor results on texts with a word count outside this range.
61
 
62
+ ## Performance
63
+
64
+ The model was evaluated on the test set of the [OLID-BR](https://dougtrajano.github.io/olid-br/) dataset.
65
+
66
+ **Accuracy:** 0.4214
67
+
68
+ **Precision:** 0.8180
69
+
70
+ **Recall:** 0.7230
71
+
72
+ **F1-Score:** 0.7645
73
+
74
+ | Label | Precision | Recall | F1-Score | Support |
75
+ | :---: | :-------: | :----: | :------: | :-----: |
76
+ | `health` | 0.3182 | 0.1795 | 0.2295 | 39 |
77
+ | `ideology` | 0.6820 | 0.6842 | 0.6831 | 304 |
78
+ | `insult` | 0.9689 | 0.8068 | 0.8805 | 1351 |
79
+ | `lgbtqphobia` | 0.8182 | 0.5870 | 0.6835 | 92 |
80
+ | `other_lifestyle` | 0.4242 | 0.4118 | 0.4179 | 34 |
81
+ | `physical_aspects` | 0.4324 | 0.5783 | 0.4948 | 83 |
82
+ | `profanity_obscene` | 0.7482 | 0.7509 | 0.7496 | 562 |
83
+ | `racism` | 0.4737 | 0.3913 | 0.4286 | 23 |
84
+ | `sexism` | 0.5132 | 0.3391 | 0.4084 | 115 |
85
+ | `xenophobia` | 0.3333 | 0.4375 | 0.3784 | 32 |
86
 
87
  ## Training procedure
88
 
89
  ### Training hyperparameters
90
 
91
  The following hyperparameters were used during training:
92
+
93
  - learning_rate: 7.044186985160909e-05
94
  - train_batch_size: 8
95
  - eval_batch_size: 8
 
98
  - lr_scheduler_type: linear
99
  - num_epochs: 30
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ### Framework versions
102
 
103
  - Transformers 4.26.0
104
  - Pytorch 1.10.2+cu113
105
  - Datasets 2.9.0
106
  - Tokenizers 0.13.2
107
+
108
+ ## Provide Feedback
109
+
110
+ If you have any feedback on this model, please [open an issue](https://github.com/DougTrajano/ToChiquinho/issues/new) on GitHub.