|
--- |
|
license: apache-2.0 |
|
language: |
|
- es |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
widget: |
|
- text: Te quiero. Te amo |
|
output: |
|
- label: 'Positive' |
|
score: 1.000 |
|
- label: 'Negative' |
|
score: 0.000 |
|
--- |
|
|
|
# Spanish Sentiment Analysis Classifier |
|
|
|
## Overview |
|
This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA). |
|
The model is designed to detect sentiments in Spanish and was fine-tuned on the *dccuchile/bert-base-spanish-wwm-uncased* model using a specific set of hyperparameters. |
|
It was trained on a dataset containing 11,500 Spanish tweets collected from various regions, both positive and negative. |
|
|
|
## Team Members |
|
- **[Azul Fuentes](https://github.com/azu26)** |
|
- **[Dante Reinaudo](https://github.com/DanteReinaudo)** |
|
- **[Lucía Pardo](https://github.com/luciaPardo)** |
|
- **[Roberto Iskandarani](https://github.com/Robert-Iskandarani)** |
|
|
|
|
|
## Model Details |
|
* **Base Mode**: dccuchile/bert-base-spanish-wwm-uncased |
|
* **Hyperparameters**: |
|
* **dropout_rate = 0.1** |
|
* **num_classes = 2** |
|
* **max_length = 128** |
|
* **batch_size = 16** |
|
* **num_epochs = 10** |
|
* **learning_rate = 3e-5** |
|
|
|
* **Dataset**: 11,500 Spanish tweets (Positive and Negative) |
|
|
|
## Metrics |
|
The model's performance was evaluated using the following metrics: |
|
|
|
* **Accuracy = _86.47%%_** |
|
* **F1-Score = _86.47%_** |
|
* **Precision = _86.46%_** |
|
* **Recall = _86.51%_** |
|
|
|
|
|
|
|
## Usage |
|
### Installation |
|
You can install the required dependencies using pip: |
|
|
|
```bash |
|
pip install transformers torch |
|
``` |
|
|
|
### Loading the Model |
|
```python |
|
from transformers import BertForSequenceClassification, BertTokenizer |
|
model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis") |
|
tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis") |
|
``` |
|
|
|
### Predict Function |
|
```python |
|
def predict(model,tokenizer,text,threshold = 0.5): |
|
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
logits = outputs.logits |
|
probabilities = torch.softmax(logits, dim=1).squeeze().tolist() |
|
|
|
predicted_class = torch.argmax(logits, dim=1).item() |
|
if probabilities[predicted_class] <= threshold and predicted_class == 1: |
|
predicted_class = 0 |
|
|
|
return bool(predicted_class), probabilities |
|
``` |
|
### Making Predictions |
|
|
|
```python |
|
text = "Your Spanish news text here" |
|
predicted_label,probabilities = predict(model,tokenizer,text) |
|
print(f"Text: {text}") |
|
print(f"Predicted Class: {predicted_label}") |
|
print(f"Probabilities: {probabilities}") |
|
``` |
|
|
|
## License |
|
Apache License 2.0 |
|
|
|
## Acknowledgments |
|
Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training. |
|
|