Edit model card

Spanish Sentiment Analysis Classifier


This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA). The model is designed to detect sentiments in Spanish and was fine-tuned on the dccuchile/bert-base-spanish-wwm-uncased model using a specific set of hyperparameters. It was trained on a dataset containing 11,500 Spanish tweets collected from various regions, both positive and negative. These tweets were sourced from a well-curated combination of TASS datasets.

Team Members

Model Details

  • Base Mode: dccuchile/bert-base-spanish-wwm-uncased

  • Hyperparameters:

    • dropout_rate = 0.1
    • num_classes = 2
    • max_length = 128
    • batch_size = 16
    • num_epochs = 5
    • learning_rate = 3e-5
  • Dataset: 11,500 Spanish tweets (Positive and Negative)


The model's performance was evaluated using the following metrics:

  • Accuracy = 86.47%
  • F1-Score = 86.47%
  • Precision = 86.46%
  • Recall = 86.51%



You can install the required dependencies using pip:

pip install transformers torch

Loading the Model

from transformers import BertForSequenceClassification, BertTokenizer
model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")
tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")

Predict Function

def predict(model,tokenizer,text,threshold = 0.5):   
        inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
        with torch.no_grad():
            outputs = model(**inputs)
        logits = outputs.logits
        probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
        predicted_class = torch.argmax(logits, dim=1).item()
        if probabilities[predicted_class] <= threshold and predicted_class == 1:
            predicted_class = 0
        return bool(predicted_class), probabilities

Making Predictions

text = "Your Spanish news text here"
predicted_label,probabilities = predict(model,tokenizer,text)
print(f"Text: {text}")
print(f"Predicted Class: {predicted_label}")
print(f"Probabilities: {probabilities}")



Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.

Downloads last month
Model size
110M params
Tensor type
Inference API
This model can be loaded on Inference API (serverless).