edumunozsala's picture
Upload README.md
dcc378b
---
language: es
tags:
- sagemaker
- ruperta
- TextClassification
- SentimentAnalysis
license: apache-2.0
datasets:
- IMDbreviews_es
model-index:
name: RuPERTa_base_sentiment_analysis_es
results:
- task:
name: Sentiment Analysis
type: sentiment-analysis
- dataset:
name: "IMDb Reviews in Spanish"
type: IMDbreviews_es
- metrics:
- name: Accuracy,
type: accuracy,
value: 0.881866
- name: F1 Score,
type: f1,
value: 0.008272
- name: Precision,
type: precision,
value: 0.858605
- name: Recall,
type: recall,
value: 0.920062
widget:
- text: "Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
---
## Model `RuPERTa_base_sentiment_analysis_es`
### **A finetuned model for Sentiment analysis in Spanish**
This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container,
The base model is **RuPERTa-base (uncased)** which is a RoBERTa model trained on a uncased version of big Spanish corpus.
It was trained by mrm8488, Manuel Romero.[Link to base model](https://huggingface.co/mrm8488/RuPERTa-base)
## Dataset
The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.
Sizes of datasets:
- Train dataset: 42,500
- Validation dataset: 3,750
- Test dataset: 3,750
## Hyperparameters
{
"epochs": "4",
"train_batch_size": "32",
"eval_batch_size": "8",
"fp16": "true",
"learning_rate": "3e-05",
"model_name": "\"mrm8488/RuPERTa-base\"",
"sagemaker_container_log_level": "20",
"sagemaker_program": "\"train.py\"",
}
## Evaluation results
Accuracy = 0.8629333333333333
F1 Score = 0.8648790746582545
Precision = 0.8479381443298969
Recall = 0.8825107296137339
## Test results
Accuracy = 0.8066666666666666
F1 Score = 0.8057862309134743
Precision = 0.7928307854507116
Recall = 0.8191721132897604
## Model in action
### Usage for Sentiment Analysis
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("edumunozsala/RuPERTa_base_sentiment_analysis_es")
model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/RuPERTa_base_sentiment_analysis_es")
text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
outputs = model(input_ids)
output = outputs.logits.argmax(1)
```
Created by [Eduardo Muñoz/@edumunozsala](https://github.com/edumunozsala)