--- license: apache-2.0 language: - es metrics: - accuracy pipeline_tag: text-classification widget: - text: Te quiero. Te amo output: - label: 'Positive' score: 1.000 - label: 'Negative' score: 0.000 --- # Spanish Sentiment Analysis Classifier ## Overview This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA). The model is designed to detect sentiments in Spanish and was fine-tuned on the *dccuchile/bert-base-spanish-wwm-uncased* model using a specific set of hyperparameters. It was trained on a dataset containing 11,500 Spanish tweets collected from various regions, both positive and negative. These tweets were sourced from a well-curated combination of TASS datasets. ## Team Members - **[Azul Fuentes](https://github.com/azu26)** - **[Dante Reinaudo](https://github.com/DanteReinaudo)** - **[LucĂ­a Pardo](https://github.com/luciaPardo)** - **[Roberto Iskandarani](https://github.com/Robert-Iskandarani)** ## Model Details * **Base Mode**: dccuchile/bert-base-spanish-wwm-uncased * **Hyperparameters**: * **dropout_rate = 0.1** * **num_classes = 2** * **max_length = 128** * **batch_size = 16** * **num_epochs = 5** * **learning_rate = 3e-5** * **Dataset**: 11,500 Spanish tweets (Positive and Negative) ## Metrics The model's performance was evaluated using the following metrics: * **Accuracy = _86.47%_** * **F1-Score = _86.47%_** * **Precision = _86.46%_** * **Recall = _86.51%_** ## Usage ### Installation You can install the required dependencies using pip: ```bash pip install transformers torch ``` ### Loading the Model ```python from transformers import BertForSequenceClassification, BertTokenizer model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis") tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis") ``` ### Predict Function ```python def predict(model,tokenizer,text,threshold = 0.5): inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits probabilities = torch.softmax(logits, dim=1).squeeze().tolist() predicted_class = torch.argmax(logits, dim=1).item() if probabilities[predicted_class] <= threshold and predicted_class == 1: predicted_class = 0 return bool(predicted_class), probabilities ``` ### Making Predictions ```python text = "Your Spanish news text here" predicted_label,probabilities = predict(model,tokenizer,text) print(f"Text: {text}") print(f"Predicted Class: {predicted_label}") print(f"Probabilities: {probabilities}") ``` ## License * Apache License 2.0 * [TASS Dataset license](http://tass.sepln.org/tass_data/download.php) ## Acknowledgments Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.