Edit model card

Model roberta_bne_sentiment_analysis_es

A finetuned model for Sentiment analysis in Spanish

This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container, The base model is RoBERTa-base-bne which is a RoBERTa base model and has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB. It was trained by The National Library of Spain (Biblioteca Nacional de España)

RoBERTa BNE Citation Check out the paper for all the details: https://arxiv.org/abs/2107.07253

@article{gutierrezfandino2022,
    author = {Asier Gutiérrez-Fandiño and Jordi Armengol-Estapé and Marc Pàmies and Joan Llop-Palao and Joaquin Silveira-Ocampo and Casimiro Pio Carrino and Carme Armentano-Oller and Carlos Rodriguez-Penagos and Aitor Gonzalez-Agirre and Marta Villegas},
    title = {MarIA: Spanish Language Models},
    journal = {Procesamiento del Lenguaje Natural},
    volume = {68},
    number = {0},
    year = {2022},
    issn = {1989-7553},
    url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405},
    pages = {39--60}
}

Dataset

The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.

Sizes of datasets:

  • Train dataset: 42,500
  • Validation dataset: 3,750
  • Test dataset: 3,750

Intended uses & limitations

This model is intented for Sentiment Analysis for spanish corpus and finetuned specially for movie reviews but it can be applied to other kind of reviews.

Hyperparameters

{
"epochs": "4",
"train_batch_size": "32",    
"eval_batch_size": "8",
"fp16": "true",
"learning_rate": "3e-05",
"model_name": "\"PlanTL-GOB-ES/roberta-base-bne\"",
"sagemaker_container_log_level": "20",
"sagemaker_program": "\"train.py\"",
}

Evaluation results

  • Accuracy = 0.9106666666666666

  • F1 Score = 0.9090909090909091

  • Precision = 0.9063852813852814

  • Recall = 0.9118127381600436

Test results

Model in action

Usage for Sentiment Analysis

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("edumunozsala/roberta_bne_sentiment_analysis_es")
model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/roberta_bne_sentiment_analysis_es")

text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"

input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
outputs = model(input_ids)
output = outputs.logits.argmax(1)

Created by Eduardo Muñoz/@edumunozsala

Downloads last month
86
Safetensors
Model size
125M params
Tensor type
I64
·
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results