Edit model card

Spanish Sentiment Analysis Classifier

Overview

This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA). The model is designed to detect sentiments in Spanish and was fine-tuned on the dccuchile/bert-base-spanish-wwm-uncased model using a specific set of hyperparameters. It was trained on a dataset containing 11,500 Spanish tweets collected from various regions, both positive and negative.

Team Members

Model Details

  • Base Mode: dccuchile/bert-base-spanish-wwm-uncased

  • Hyperparameters:

    • dropout_rate = 0.1
    • num_classes = 2
    • max_length = 128
    • batch_size = 16
    • num_epochs = 5
    • learning_rate = 3e-5
  • Dataset: 11,500 Spanish tweets (Positive and Negative)

Metrics

The model's performance was evaluated using the following metrics:

  • Accuracy = 86.47%
  • F1-Score = 86.47%
  • Precision = 86.46%
  • Recall = 86.51%

Usage

Installation

You can install the required dependencies using pip:

pip install transformers torch

Loading the Model

from transformers import BertForSequenceClassification, BertTokenizer
model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")
tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")

Predict Function

def predict(model,tokenizer,text,threshold = 0.5):   
        inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
        with torch.no_grad():
            outputs = model(**inputs)
        
        logits = outputs.logits
        probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
        
        predicted_class = torch.argmax(logits, dim=1).item()
        if probabilities[predicted_class] <= threshold and predicted_class == 1:
            predicted_class = 0
  
        return bool(predicted_class), probabilities

Making Predictions

text = "Your Spanish news text here"
predicted_label,probabilities = predict(model,tokenizer,text)
print(f"Text: {text}")
print(f"Predicted Class: {predicted_label}")
print(f"Probabilities: {probabilities}")

License

Apache License 2.0

Acknowledgments

Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.

Downloads last month
208
Safetensors
Model size
110M params
Tensor type
F32
·