Introduction

Tupi-BERT-Large is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese. Derived from the BERTimbau large, TuPi-Large is refinde solution for addressing hate speech concerns. For more details or specific inquiries, please refer to the BERTimbau repository.

The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. In the creation of a specialized Portuguese Language Model tailored for hate speech classification, the original BERTimbau model underwent fine-tuning processe carried out on the TuPi Hate Speech DataSet, sourced from diverse social networks.

Available models

Model Arch. #Layers #Params
FpOliveira/tupi-bert-base-portuguese-cased BERT-Base 12 109M
FpOliveira/tupi-bert-large-portuguese-cased BERT-Large 24 334M
FpOliveira/tupi-bert-base-portuguese-cased-multiclass-multilabel BERT-Base 12 109M
FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel BERT-Large 24 334M

Example usage usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
import torch
import numpy as np
from scipy.special import softmax

def classify_hate_speech(model_name, text):
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    config = AutoConfig.from_pretrained(model_name)

    # Tokenize input text and prepare model input
    model_input = tokenizer(text, padding=True, return_tensors="pt")

    # Get model output scores
    with torch.no_grad():
        output = model(**model_input)
        scores = softmax(output.logits.numpy(), axis=1)
        ranking = np.argsort(scores[0])[::-1]

    # Print the results
    for i, rank in enumerate(ranking):
        label = config.id2label[rank]
        score = scores[0, rank]
        print(f"{i + 1}) Label: {label} Score: {score:.4f}")

# Example usage
model_name = "FpOliveira/tupi-bert-large-portuguese-cased"
text = "Bom dia, flor do dia!!"
classify_hate_speech(model_name, text)
 
Downloads last month
31
Safetensors
Model size
334M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for FpOliveira/tupi-bert-large-portuguese-cased

Finetuned
(34)
this model

Dataset used to train FpOliveira/tupi-bert-large-portuguese-cased

Spaces using FpOliveira/tupi-bert-large-portuguese-cased 2

Collection including FpOliveira/tupi-bert-large-portuguese-cased