FpOliveira
/

tupi-bert-base-portuguese-cased

Text Classification

Inference Endpoints

Model card Files Files and versions Community

tupi-bert-base-portuguese-cased / README.md

FpOliveira's picture

Update README.md

2a142d4 11 months ago

|

history blame contribute delete

2.8 kB

	---
	license: mit
	datasets:
	- FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary
	language:
	- pt
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	pipeline_tag: text-classification
	base_model: neuralmind/bert-base-portuguese-cased
	widget:
	- text: 'Bom dia, flor do dia!!'
	---

	## Introduction


	Tupi-BERT-Base is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese. Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased), TuPi-Base is refinde solution for addressing hate speech concerns.
	For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).

	The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. In the creation of a specialized Portuguese Language Model tailored for hate speech classification, the original BERTimbau model underwent fine-tuning processe carried out on the [TuPi Hate Speech DataSet](https://huggingface.co/datasets/FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary), sourced from diverse social networks.

	## Available models

	\| Model \| Arch. \| #Layers \| #Params \|
	\| ---------------------------------------- \| ---------- \| ------- \| ------- \|
	\| `FpOliveira/tupi-bert-base-portuguese-cased` \| BERT-Base \|12 \|109M\|
	\| `FpOliveira/tupi-bert-large-portuguese-cased` \| BERT-Large \| 24 \| 334M \|
	\| `FpOliveira/tupi-bert-base-portuguese-cased-multiclass-multilabel` \| BERT-Base \| 12 \| 109M \|
	\| `FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel` \| BERT-Large \| 24 \| 334M \|

	## Example usage usage

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
	import torch
	import numpy as np
	from scipy.special import softmax

	def classify_hate_speech(model_name, text):
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	config = AutoConfig.from_pretrained(model_name)

	# Tokenize input text and prepare model input
	model_input = tokenizer(text, padding=True, return_tensors="pt")

	# Get model output scores
	with torch.no_grad():
	output = model(**model_input)
	scores = softmax(output.logits.numpy(), axis=1)
	ranking = np.argsort(scores[0])[::-1]

	# Print the results
	for i, rank in enumerate(ranking):
	label = config.id2label[rank]
	score = scores[0, rank]
	print(f"{i + 1}) Label: {label} Score: {score:.4f}")

	# Example usage
	model_name = "FpOliveira/tupi-bert-base-portuguese-cased"
	text = "Bom dia, flor do dia!!"
	classify_hate_speech(model_name, text)

	```