FpOliveira
/

tupi-bert-large-portuguese-cased

Text Classification

Inference Endpoints

Model card Files Files and versions Community

FpOliveira commited on Dec 1, 2023

Commit

1e967ba

·

1 Parent(s): 929a65a

Create README.md

Files changed (1) hide show

README.md +65 -0

README.md ADDED Viewed

	@@ -0,0 +1,65 @@

+---
+license: mit
+datasets:
+- FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary
+language:
+- pt
+metrics:
+- accuracy
+- precision
+- recall
+- f1
+pipeline_tag: text-classification
+---
+## Introduction
+Tupi-BERT-Large is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese. Derived from the [BERTimbau large](https://huggingface.co/neuralmind/bert-large-portuguese-cased), TuPi-Large is refinde solution for addressing hate speech concerns.
+For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
+The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. In the creation of a specialized Portuguese Language Model tailored for hate speech classification, the original BERTimbau model underwent fine-tuning processe carried out on the [TuPi Hate Speech DataSet](https://huggingface.co/datasets/FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary), sourced from diverse social networks.
+## Available models
+| Model                                    | Arch.      | #Layers | #Params |
+| ---------------------------------------- | ---------- | ------- | ------- |
+| `FpOliveira/tupi-bert-base-portuguese-cased`  | BERT-Base	|12	|109M|
+| `FpOliveira/tupi-bert-large-portuguese-cased` | BERT-Large | 24      | 334M    |
+| `FpOliveira/tupi-bert-base-portuguese-cased-multiclass-multilabel` | BERT-Base | 12      | 109M    |
+| `FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel` | BERT-Large | 24      | 334M    |
+## Example usage usage
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
+import torch
+import numpy as np
+from scipy.special import softmax
+def classify_hate_speech(model_name, text):
+    model = AutoModelForSequenceClassification.from_pretrained(model_name)
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    config = AutoConfig.from_pretrained(model_name)
+    # Tokenize input text and prepare model input
+    model_input = tokenizer(text, padding=True, return_tensors="pt")
+    # Get model output scores
+    with torch.no_grad():
+        output = model(**model_input)
+        scores = softmax(output.logits.numpy(), axis=1)
+        ranking = np.argsort(scores[0])[::-1]
+    # Print the results
+    for i, rank in enumerate(ranking):
+        label = config.id2label[rank]
+        score = scores[0, rank]
+        print(f"{i + 1}) Label: {label} Score: {score:.4f}")
+# Example usage
+model_name = "FpOliveira/tupi-bert-base-portuguese-cased"
+text = "Quem não deve não teme!!"
+classify_hate_speech(model_name, text)
+```