Edit model card

Model bertin_base_sentiment_analysis_es

A finetuned model for Sentiment analysis in Spanish

This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container, The base model is Bertin base which is a RoBERTa-base model pre-trained on the Spanish portion of mC4 using Flax. It was trained by the Bertin Project.Link to base model

Article: BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling

Dataset

The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.

Sizes of datasets:

  • Train dataset: 42,500
  • Validation dataset: 3,750
  • Test dataset: 3,750

Intended uses & limitations

This model is intented for Sentiment Analysis for spanish corpus and finetuned specially for movie reviews but it can be applied to other kind of reviews.

Hyperparameters

{
"epochs": "4",
"train_batch_size": "32",    
"eval_batch_size": "8",
"fp16": "true",
"learning_rate": "3e-05",
"model_name": "\"bertin-project/bertin-roberta-base-spanish\"",
"sagemaker_container_log_level": "20",
"sagemaker_program": "\"train.py\"",
}

Evaluation results

  • Accuracy = 0.8989333333333334

  • F1 Score = 0.8989063750333421

  • Precision = 0.877147319104633

  • Recall = 0.9217724288840262

Test results

Model in action

Usage for Sentiment Analysis

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("edumunozsala/bertin_base_sentiment_analysis_es")
model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/bertin_base_sentiment_analysis_es")

text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"

input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
outputs = model(input_ids)
output = outputs.logits.argmax(1)

Created by Eduardo Muñoz/@edumunozsala

Downloads last month
127
Safetensors
Model size
125M params
Tensor type
I64
·
F32
·

Evaluation results