edumunozsala
/

RuPERTa_base_sentiment_analysis_es

@@ -1,84 +1,96 @@
----
-language: es
-tags:
-- sagemaker
-- roberta
-- ruperta
-- TextClassification
-license: apache-2.0
-datasets:
-- IMDbreviews_es
-model-index:
-- name: RuPERTa_base_sentiment_analysis_es
-  results:
-  - task:
-      name: Sentiment Analysis
-      type: sentiment-analysis
-  - dataset:
-      name: "IMDb Reviews in Spanish"
-      type: IMDbreviews_es
-  - metrics:
-       - name: Accuracy,
-         type: accuracy,
-         value: 0.881866
-       - name: F1 Score,
-         type: f1,
-         value: 0.008272
-       - name: Precision,
-         type: precision,
-         value: 0.858605
-       - name: Recall,
-         type: recall,
-         value: 0.920062
-## `RuPERTa_base_sentiment_analysis_es`
-This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container.
-The base model is RuPERTa-base (uncased) which is a RoBERTa model trained on a uncased version of big Spanish corpus.
-It was trained by mrm8488, Manuel Romero. It is fine-tuned for a sentiment analysis task.
-## Hyperparameters
-    {
-    "epochs": "4",
-    "eval_batch_size": "8",
-    "fp16": "true",
-    "learning_rate": "3e-05",
-    "model_name": "\"mrm8488/RuPERTa-base\"",
-    "sagemaker_container_log_level": "20",
-    "sagemaker_job_name": "\"ruperta-sentiment-analysis-full-p2-2021-12-06-20-32-27\"",
-    "sagemaker_program": "\"train.py\"",
-    "sagemaker_region": "\"us-east-1\"",
-    "sagemaker_submit_directory": "\"s3://edumunozsala-ml-sagemaker/ruperta-sentiment/ruperta-sentiment-analysis-full-p2-2021-12-06-20-32-27/source/sourcedir.tar.gz\"",
-    "train_batch_size": "32",
-    "train_filename": "\"train_data.pt\"",
-    "val_filename": "\"val_data.pt\""
-    }
-## Evaluation results
-epoch = 1.0
-eval_accuracy = 0.8629333333333333
-eval_f1 = 0.8648790746582545
-eval_loss = 0.3160930573940277
-eval_precision = 0.8479381443298969
-eval_recall = 0.8825107296137339

+---
+language: es
+tags:
+- sagemaker
+- ruperta
+- TextClassification
+- SentimentAnalysis
+license: apache-2.0
+datasets:
+- IMDbreviews_es
+model-index:
+name: RuPERTa_base_sentiment_analysis_es
+results:
+  - task:
+        name: Sentiment Analysis
+        type: sentiment-analysis
+  - dataset:
+        name: "IMDb Reviews in Spanish"
+        type: IMDbreviews_es
+  - metrics:
+       - name: Accuracy,
+         type: accuracy,
+         value: 0.881866
+       - name: F1 Score,
+         type: f1,
+         value: 0.008272
+       - name: Precision,
+         type: precision,
+         value: 0.858605
+       - name: Recall,
+         type: recall,
+         value: 0.920062
+widget:
+- text: "Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
+---
+## Model `RuPERTa_base_sentiment_analysis_es`
+### **A finetuned model for Sentiment analysis in Spanish**
+This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container,
+The base model is **RuPERTa-base (uncased)** which is a RoBERTa model trained on a uncased version of big Spanish corpus.
+It was trained by mrm8488, Manuel Romero.[Link to base model](https://huggingface.co/mrm8488/RuPERTa-base)
+## Dataset
+The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.
+Sizes of datasets:
+- Train dataset: 42,500
+- Validation dataset: 3,750
+- Test dataset: 3,750
+## Hyperparameters
+    {
+    "epochs": "4",
+    "train_batch_size": "32",
+    "eval_batch_size": "8",
+    "fp16": "true",
+    "learning_rate": "3e-05",
+    "model_name": "\"mrm8488/RuPERTa-base\"",
+    "sagemaker_container_log_level": "20",
+    "sagemaker_program": "\"train.py\"",
+    }
+## Evaluation results
+Accuracy = 0.8629333333333333
+F1 Score = 0.8648790746582545
+Precision = 0.8479381443298969
+Recall = 0.8825107296137339
+## Test results
+Accuracy = 0.8066666666666666
+F1 Score = 0.8057862309134743
+Precision = 0.7928307854507116
+Recall = 0.8191721132897604
+## Model in action
+### Usage for Sentiment Analysis
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("edumunozsala/RuPERTa_base_sentiment_analysis_es")
+model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/RuPERTa_base_sentiment_analysis_es")
+text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
+input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
+outputs = model(input_ids)
+output = outputs.logits.argmax(1)
+```
+Created by [Eduardo Muñoz/@edumunozsala](https://github.com/edumunozsala)