VerificadoProfesional's picture
Update README.md
5e33220 verified
metadata
license: apache-2.0
language:
  - es
metrics:
  - accuracy
pipeline_tag: text-classification
widget:
  - text: Te quiero. Te amo
    output:
      - label: Positive
        score: 1
      - label: Negative
        score: 0

Spanish Sentiment Analysis Classifier

Overview

This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA). The model is designed to detect sentiments in Spanish and was fine-tuned on the dccuchile/bert-base-spanish-wwm-uncased model using a specific set of hyperparameters. It was trained on a dataset containing 11,500 Spanish tweets collected from various regions, both positive and negative. These tweets were sourced from a well-curated combination of TASS datasets.

Team Members

Model Details

  • Base Mode: dccuchile/bert-base-spanish-wwm-uncased

  • Hyperparameters:

    • dropout_rate = 0.1
    • num_classes = 2
    • max_length = 128
    • batch_size = 16
    • num_epochs = 5
    • learning_rate = 3e-5
  • Dataset: 11,500 Spanish tweets (Positive and Negative)

Metrics

The model's performance was evaluated using the following metrics:

  • Accuracy = 86.47%
  • F1-Score = 86.47%
  • Precision = 86.46%
  • Recall = 86.51%

Usage

Installation

You can install the required dependencies using pip:

pip install transformers torch

Loading the Model

from transformers import BertForSequenceClassification, BertTokenizer
model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")
tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")

Predict Function

def predict(model,tokenizer,text,threshold = 0.5):   
        inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
        with torch.no_grad():
            outputs = model(**inputs)
        
        logits = outputs.logits
        probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
        
        predicted_class = torch.argmax(logits, dim=1).item()
        if probabilities[predicted_class] <= threshold and predicted_class == 1:
            predicted_class = 0
  
        return bool(predicted_class), probabilities

Making Predictions

text = "Your Spanish news text here"
predicted_label,probabilities = predict(model,tokenizer,text)
print(f"Text: {text}")
print(f"Predicted Class: {predicted_label}")
print(f"Probabilities: {probabilities}")

License

Acknowledgments

Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.