File size: 2,920 Bytes
334de57 f9b4e1c 9e6bee9 f9b4e1c 9e6bee9 07d6a9a 334de57 9e6bee9 f9b4e1c 9e6bee9 07d6a9a 9e6bee9 07d6a9a 9e6bee9 07d6a9a 9e6bee9 07d6a9a 9e6bee9 07d6a9a 9e6bee9 07d6a9a 9e6bee9 07d6a9a 9e6bee9 07d6a9a 9e6bee9 f9b4e1c 9e6bee9 07d6a9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
license: mit
tags:
- generated_from_trainer
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: toxicity-target-type-identification
results: []
datasets:
- dougtrajano/olid-br
language:
- pt
library_name: transformers
---
# toxicity-target-type-identification
Toxicity Target Type Identification is a model that classifies the type (individual, group, or other) of a given targeted text.
This BERT model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the [OLID-BR dataset](https://huggingface.co/datasets/dougtrajano/olid-br).
## Overview
**Input:** Text in Brazilian Portuguese
**Output:** Multiclass classification (individual, group, or other)
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("dougtrajano/toxicity-target-type-identification")
model = AutoModelForSequenceClassification.from_pretrained("dougtrajano/toxicity-target-type-identification")
```
## Limitations and bias
The following factors may degrade the model’s performance.
**Text Language**: The model was trained on Brazilian Portuguese texts, so it may not work well with Portuguese dialects.
**Text Origin**: The model was trained on texts from social media and a few texts from other sources, so it may not work well on other types of texts.
## Trade-offs
Sometimes models exhibit performance issues under particular circumstances. In this section, we'll discuss situations in which you might discover that the model performs less than optimally, and should plan accordingly.
**Text Length**: The model was fine-tuned on texts with a word count between 1 and 178 words (average of 18 words). It may give poor results on texts with a word count outside this range.
## Performance
The model was evaluated on the test set of the [OLID-BR](https://dougtrajano.github.io/olid-br/) dataset.
**Accuracy:** 0.7505
**Precision:** 0.7812
**Recall:** 0.7505
**F1-Score:** 0.7603
| Class | Precision | Recall | F1-Score | Support |
| :---: | :-------: | :----: | :------: | :-----: |
| `INDIVIDUAL` | 0.8850 | 0.7964 | 0.8384 | 609 |
| `GROUP` | 0.6766 | 0.6385 | 0.6570 | 213 |
| `OTHER` | 0.4518 | 0.7177 | 0.5545 | 124 |
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3.952388499692274e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 1993
- optimizer: Adam with betas=(0.9944095815441554,0.8750000522553327) and epsilon=1.8526084265228802e-07
- lr_scheduler_type: linear
- num_epochs: 30
### Framework versions
- Transformers 4.26.1
- Pytorch 1.10.2+cu113
- Datasets 2.9.0
- Tokenizers 0.13.2
## Provide Feedback
If you have any feedback on this model, please [open an issue](https://github.com/DougTrajano/ToChiquinho/issues/new) on GitHub. |