VerificadoProfesional
/

SaBERT-Spanish-Sentiment-Analysis

Text Classification

Inference Endpoints

Model card Files Files and versions Community

SaBERT-Spanish-Sentiment-Analysis / README.md

VerificadoProfesional's picture

VerificadoProfesional

Update README.md

5e33220 verified 28 days ago

|

raw history blame contribute delete

No virus

3.1 kB

	---
	license: apache-2.0
	language:
	- es
	metrics:
	- accuracy
	pipeline_tag: text-classification
	widget:
	- text: Te quiero. Te amo
	output:
	- label: 'Positive'
	score: 1.000
	- label: 'Negative'
	score: 0.000
	---

	# Spanish Sentiment Analysis Classifier

	## Overview
	This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA).
	The model is designed to detect sentiments in Spanish and was fine-tuned on the dccuchile/bert-base-spanish-wwm-uncased model using a specific set of hyperparameters.
	It was trained on a dataset containing 11,500 Spanish tweets collected from various regions, both positive and negative. These tweets were sourced from a well-curated combination of TASS datasets.


	## Team Members
	- [Azul Fuentes](https://github.com/azu26)
	- [Dante Reinaudo](https://github.com/DanteReinaudo)
	- [Lucía Pardo](https://github.com/luciaPardo)
	- [Roberto Iskandarani](https://github.com/Robert-Iskandarani)


	## Model Details
	* Base Mode: dccuchile/bert-base-spanish-wwm-uncased
	* Hyperparameters:
	* dropout_rate = 0.1
	* num_classes = 2
	* max_length = 128
	* batch_size = 16
	* num_epochs = 5
	* learning_rate = 3e-5

	* Dataset: 11,500 Spanish tweets (Positive and Negative)

	## Metrics
	The model's performance was evaluated using the following metrics:

	* Accuracy = _86.47%_
	* F1-Score = _86.47%_
	* Precision = _86.46%_
	* Recall = _86.51%_

	## Usage
	### Installation
	You can install the required dependencies using pip:

	```bash
	pip install transformers torch
	```

	### Loading the Model
	```python
	from transformers import BertForSequenceClassification, BertTokenizer
	model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")
	tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")
	```

	### Predict Function
	```python
	def predict(model,tokenizer,text,threshold = 0.5):
	inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
	with torch.no_grad():
	outputs = model(**inputs)

	logits = outputs.logits
	probabilities = torch.softmax(logits, dim=1).squeeze().tolist()

	predicted_class = torch.argmax(logits, dim=1).item()
	if probabilities[predicted_class] <= threshold and predicted_class == 1:
	predicted_class = 0

	return bool(predicted_class), probabilities
	```
	### Making Predictions

	```python
	text = "Your Spanish news text here"
	predicted_label,probabilities = predict(model,tokenizer,text)
	print(f"Text: {text}")
	print(f"Predicted Class: {predicted_label}")
	print(f"Probabilities: {probabilities}")
	```

	## License
	* Apache License 2.0
	* [TASS Dataset license](http://tass.sepln.org/tass_data/download.php)

	## Acknowledgments
	Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.