dougtrajano commited on
Commit
07d6a9a
1 Parent(s): f9b4e1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -25
README.md CHANGED
@@ -10,38 +10,73 @@ metrics:
10
  model-index:
11
  - name: toxicity-target-type-identification
12
  results: []
 
 
 
 
 
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
  # toxicity-target-type-identification
19
 
20
- This model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the None dataset.
21
- It achieves the following results on the evaluation set:
22
- - Loss: 0.7001
23
- - Accuracy: 0.7505
24
- - F1: 0.7603
25
- - Precision: 0.7813
26
- - Recall: 0.7505
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- ## Model description
29
 
30
- More information needed
31
 
32
- ## Intended uses & limitations
33
 
34
- More information needed
35
 
36
- ## Training and evaluation data
37
 
38
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Training procedure
41
 
42
  ### Training hyperparameters
43
 
44
  The following hyperparameters were used during training:
 
45
  - learning_rate: 3.952388499692274e-05
46
  - train_batch_size: 8
47
  - eval_batch_size: 8
@@ -50,18 +85,13 @@ The following hyperparameters were used during training:
50
  - lr_scheduler_type: linear
51
  - num_epochs: 30
52
 
53
- ### Training results
54
-
55
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
56
- |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
57
- | No log | 1.0 | 355 | 0.7001 | 0.7505 | 0.7603 | 0.7813 | 0.7505 |
58
- | 0.7919 | 2.0 | 710 | 1.0953 | 0.7505 | 0.7452 | 0.7590 | 0.7505 |
59
- | 0.5218 | 3.0 | 1065 | 1.4217 | 0.7484 | 0.7551 | 0.7688 | 0.7484 |
60
-
61
-
62
  ### Framework versions
63
 
64
  - Transformers 4.26.1
65
  - Pytorch 1.10.2+cu113
66
  - Datasets 2.9.0
67
  - Tokenizers 0.13.2
 
 
 
 
 
10
  model-index:
11
  - name: toxicity-target-type-identification
12
  results: []
13
+ datasets:
14
+ - dougtrajano/olid-br
15
+ language:
16
+ - pt
17
+ library_name: transformers
18
  ---
19
 
 
 
 
20
  # toxicity-target-type-identification
21
 
22
+ Toxicity Target Type Identification is a model that classifies the type (individual, group, or other) of a given targeted text.
23
+
24
+ This BERT model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the [OLID-BR dataset](https://huggingface.co/datasets/dougtrajano/olid-br).
25
+
26
+ ## Overview
27
+
28
+ **Input:** Text in Brazilian Portuguese
29
+
30
+ **Output:** Multiclass classification (individual, group, or other)
31
+
32
+ ## Usage
33
+
34
+ ```python
35
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
36
+
37
+ tokenizer = AutoTokenizer.from_pretrained("dougtrajano/toxicity-target-type-identification")
38
+
39
+ model = AutoModelForSequenceClassification.from_pretrained("dougtrajano/toxicity-target-type-identification")
40
+ ```
41
+
42
+ ## Limitations and bias
43
+
44
+ The following factors may degrade the model’s performance.
45
+
46
+ **Text Language**: The model was trained on Brazilian Portuguese texts, so it may not work well with Portuguese dialects.
47
 
48
+ **Text Origin**: The model was trained on texts from social media and a few texts from other sources, so it may not work well on other types of texts.
49
 
50
+ ## Trade-offs
51
 
52
+ Sometimes models exhibit performance issues under particular circumstances. In this section, we'll discuss situations in which you might discover that the model performs less than optimally, and should plan accordingly.
53
 
54
+ **Text Length**: The model was fine-tuned on texts with a word count between 1 and 178 words (average of 18 words). It may give poor results on texts with a word count outside this range.
55
 
56
+ ## Performance
57
 
58
+ The model was evaluated on the test set of the [OLID-BR](https://dougtrajano.github.io/olid-br/) dataset.
59
+
60
+ **Accuracy:** 0.7505
61
+
62
+ **Precision:** 0.7812
63
+
64
+ **Recall:** 0.7505
65
+
66
+ **F1-Score:** 0.7603
67
+
68
+ | Class | Precision | Recall | F1-Score | Support |
69
+ | :---: | :-------: | :----: | :------: | :-----: |
70
+ | `INDIVIDUAL` | 0.8850 | 0.7964 | 0.8384 | 609 |
71
+ | `GROUP` | 0.6766 | 0.6385 | 0.6570 | 213 |
72
+ | `OTHER` | 0.4518 | 0.7177 | 0.5545 | 124 |
73
 
74
  ## Training procedure
75
 
76
  ### Training hyperparameters
77
 
78
  The following hyperparameters were used during training:
79
+
80
  - learning_rate: 3.952388499692274e-05
81
  - train_batch_size: 8
82
  - eval_batch_size: 8
 
85
  - lr_scheduler_type: linear
86
  - num_epochs: 30
87
 
 
 
 
 
 
 
 
 
 
88
  ### Framework versions
89
 
90
  - Transformers 4.26.1
91
  - Pytorch 1.10.2+cu113
92
  - Datasets 2.9.0
93
  - Tokenizers 0.13.2
94
+
95
+ ## Provide Feedback
96
+
97
+ If you have any feedback on this model, please [open an issue](https://github.com/DougTrajano/ToChiquinho/issues/new) on GitHub.