FpOliveira commited on
Commit
1e967ba
1 Parent(s): 929a65a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary
5
+ language:
6
+ - pt
7
+ metrics:
8
+ - accuracy
9
+ - precision
10
+ - recall
11
+ - f1
12
+ pipeline_tag: text-classification
13
+ ---
14
+
15
+ ## Introduction
16
+
17
+
18
+ Tupi-BERT-Large is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese. Derived from the [BERTimbau large](https://huggingface.co/neuralmind/bert-large-portuguese-cased), TuPi-Large is refinde solution for addressing hate speech concerns.
19
+ For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
20
+
21
+ The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. In the creation of a specialized Portuguese Language Model tailored for hate speech classification, the original BERTimbau model underwent fine-tuning processe carried out on the [TuPi Hate Speech DataSet](https://huggingface.co/datasets/FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary), sourced from diverse social networks.
22
+
23
+ ## Available models
24
+
25
+ | Model | Arch. | #Layers | #Params |
26
+ | ---------------------------------------- | ---------- | ------- | ------- |
27
+ | `FpOliveira/tupi-bert-base-portuguese-cased` | BERT-Base |12 |109M|
28
+ | `FpOliveira/tupi-bert-large-portuguese-cased` | BERT-Large | 24 | 334M |
29
+ | `FpOliveira/tupi-bert-base-portuguese-cased-multiclass-multilabel` | BERT-Base | 12 | 109M |
30
+ | `FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel` | BERT-Large | 24 | 334M |
31
+
32
+ ## Example usage usage
33
+
34
+ ```python
35
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
36
+ import torch
37
+ import numpy as np
38
+ from scipy.special import softmax
39
+
40
+ def classify_hate_speech(model_name, text):
41
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
42
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
43
+ config = AutoConfig.from_pretrained(model_name)
44
+
45
+ # Tokenize input text and prepare model input
46
+ model_input = tokenizer(text, padding=True, return_tensors="pt")
47
+
48
+ # Get model output scores
49
+ with torch.no_grad():
50
+ output = model(**model_input)
51
+ scores = softmax(output.logits.numpy(), axis=1)
52
+ ranking = np.argsort(scores[0])[::-1]
53
+
54
+ # Print the results
55
+ for i, rank in enumerate(ranking):
56
+ label = config.id2label[rank]
57
+ score = scores[0, rank]
58
+ print(f"{i + 1}) Label: {label} Score: {score:.4f}")
59
+
60
+ # Example usage
61
+ model_name = "FpOliveira/tupi-bert-base-portuguese-cased"
62
+ text = "Quem não deve não teme!!"
63
+ classify_hate_speech(model_name, text)
64
+
65
+ ```