edumunozsala
/

roberta_bne_sentiment_analysis_es

+---
+language: es
+tags:
+- sagemaker
+- roberta-bne
+- TextClassification
+- SentimentAnalysis
+license: apache-2.0
+datasets:
+- IMDbreviews_es
+metrics:
+- accuracy
+model-index:
+- name: roberta_bne_sentiment_analysis_es
+  results:
+  - task:
+        name: Sentiment Analysis
+        type: sentiment-analysis
+    dataset:
+        name: "IMDb Reviews in Spanish"
+        type: IMDbreviews_es
+    metrics:
+       - name: Accuracy,
+         type: accuracy,
+         value: 0.9106666666666666
+       - name: F1 Score,
+         type: f1,
+         value: 0.9090909090909091
+       - name: Precision,
+         type: precision,
+         value: 0.9063852813852814
+       - name: Recall,
+         type: recall,
+         value: 0.9118127381600436
+widget:
+- text: "Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
+---
+# Model roberta_bne_sentiment_analysis_es
+## **A finetuned model for Sentiment analysis in Spanish**
+This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container,
+The base model is **RoBERTa-base-bne** which is a RoBERTa base model and has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB.
+It was trained by The [National Library of Spain (Biblioteca Nacional de España)](http://www.bne.es/en/Inicio/index.html)
+**RoBERTa BNE Citation**
+Check out the paper for all the details: https://arxiv.org/abs/2107.07253
+```
+@article{gutierrezfandino2022,
+	author = {Asier Gutiérrez-Fandiño and Jordi Armengol-Estapé and Marc Pàmies and Joan Llop-Palao and Joaquin Silveira-Ocampo and Casimiro Pio Carrino and Carme Armentano-Oller and Carlos Rodriguez-Penagos and Aitor Gonzalez-Agirre and Marta Villegas},
+	title = {MarIA: Spanish Language Models},
+	journal = {Procesamiento del Lenguaje Natural},
+	volume = {68},
+	number = {0},
+	year = {2022},
+	issn = {1989-7553},
+	url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405},
+	pages = {39--60}
+}
+```
+## Dataset
+The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.
+Sizes of datasets:
+- Train dataset: 42,500
+- Validation dataset: 3,750
+- Test dataset: 3,750
+## Intended uses & limitations
+This model is intented for Sentiment Analysis for spanish corpus and finetuned specially for movie reviews but it can be applied to other kind of reviews.
+## Hyperparameters
+    {
+    "epochs": "4",
+    "train_batch_size": "32",
+    "eval_batch_size": "8",
+    "fp16": "true",
+    "learning_rate": "3e-05",
+    "model_name": "\"PlanTL-GOB-ES/roberta-base-bne\"",
+    "sagemaker_container_log_level": "20",
+    "sagemaker_program": "\"train.py\"",
+    }
+## Evaluation results
+- Accuracy = 0.9106666666666666
+- F1 Score = 0.9090909090909091
+- Precision = 0.9063852813852814
+- Recall = 0.9118127381600436
+## Test results
+## Model in action
+### Usage for Sentiment Analysis
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("edumunozsala/roberta_bne_sentiment_analysis_es")
+model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/roberta_bne_sentiment_analysis_es")
+text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
+input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
+outputs = model(input_ids)
+output = outputs.logits.argmax(1)
+```
+Created by [Eduardo Muñoz/@edumunozsala](https://github.com/edumunozsala)