FpOliveira commited on
Commit
4cb78ba
1 Parent(s): 82f9ab2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - Silly-Machine/TuPyE-Dataset
5
+ language:
6
+ - pt
7
+
8
+ pipeline_tag: text-classification
9
+ base_model: neuralmind/bert-base-portuguese-cased
10
+ widget:
11
+ - text: 'Bom dia, flor do dia!!'
12
+
13
+ model-index:
14
+ - name: Yi-34B
15
+ results:
16
+ - task:
17
+ type: text-classfication
18
+ dataset:
19
+ name: Silly-Machine/TuPyE-Dataset
20
+ type: Silly-Machine/TuPyE-Dataset
21
+ metrics:
22
+ - name: f1
23
+ type: f1
24
+ value: 64.59
25
+ source:
26
+ name: Open LLM Leaderboard
27
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
28
+ ---
29
+
30
+ ## Introduction
31
+
32
+
33
+ Tupi-BERT-Base is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese. Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased), TuPi-Base is refinde solution for addressing hate speech concerns.
34
+ For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
35
+
36
+ The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. In the creation of a specialized Portuguese Language Model tailored for hate speech classification, the original BERTimbau model underwent fine-tuning processe carried out on the [TuPi Hate Speech DataSet](https://huggingface.co/datasets/FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary), sourced from diverse social networks.
37
+
38
+ ## Available models
39
+
40
+ | Model | Arch. | #Layers | #Params |
41
+ | ---------------------------------------- | ---------- | ------- | ------- |
42
+ | `Silly-Machine/TuPy-Bert-Base-Binary-Classifier` | BERT-Base |12 |109M|
43
+ | `Silly-Machine/TuPy-Bert-Large-Binary-Classifier` | BERT-Large | 24 | 334M |
44
+ | `Silly-Machine/TuPy-Bert-Base-Multilabel` | BERT-Base | 12 | 109M |
45
+ | `Silly-Machine/TuPy-Bert-Large-Multilabel` | BERT-Large | 24 | 334M |
46
+
47
+ ## Example usage usage
48
+
49
+ ```python
50
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
51
+ import torch
52
+ import numpy as np
53
+ from scipy.special import softmax
54
+
55
+ def classify_hate_speech(model_name, text):
56
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
57
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
58
+ config = AutoConfig.from_pretrained(model_name)
59
+
60
+ # Tokenize input text and prepare model input
61
+ model_input = tokenizer(text, padding=True, return_tensors="pt")
62
+
63
+ # Get model output scores
64
+ with torch.no_grad():
65
+ output = model(**model_input)
66
+ scores = softmax(output.logits.numpy(), axis=1)
67
+ ranking = np.argsort(scores[0])[::-1]
68
+
69
+ # Print the results
70
+ for i, rank in enumerate(ranking):
71
+ label = config.id2label[rank]
72
+ score = scores[0, rank]
73
+ print(f"{i + 1}) Label: {label} Score: {score:.4f}")
74
+
75
+ # Example usage
76
+ model_name = "Silly-Machine/TuPy-Bert-Base-Multilabel"
77
+ text = "Bom dia, flor do dia!!"
78
+ classify_hate_speech(model_name, text)
79
+
80
+ ```