edumunozsala commited on
Commit
da5dfed
1 Parent(s): 25b1cc0

upload README.md

Browse files

Initial README file

Files changed (1) hide show
  1. README.md +119 -0
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ tags:
4
+ - sagemaker
5
+ - roberta-bne
6
+ - TextClassification
7
+ - SentimentAnalysis
8
+ license: apache-2.0
9
+ datasets:
10
+ - IMDbreviews_es
11
+ metrics:
12
+ - accuracy
13
+ model-index:
14
+ - name: roberta_bne_sentiment_analysis_es
15
+ results:
16
+ - task:
17
+ name: Sentiment Analysis
18
+ type: sentiment-analysis
19
+ dataset:
20
+ name: "IMDb Reviews in Spanish"
21
+ type: IMDbreviews_es
22
+ metrics:
23
+ - name: Accuracy,
24
+ type: accuracy,
25
+ value: 0.9106666666666666
26
+ - name: F1 Score,
27
+ type: f1,
28
+ value: 0.9090909090909091
29
+ - name: Precision,
30
+ type: precision,
31
+ value: 0.9063852813852814
32
+ - name: Recall,
33
+ type: recall,
34
+ value: 0.9118127381600436
35
+ widget:
36
+ - text: "Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
37
+ ---
38
+
39
+ # Model roberta_bne_sentiment_analysis_es
40
+
41
+ ## **A finetuned model for Sentiment analysis in Spanish**
42
+
43
+ This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container,
44
+ The base model is **RoBERTa-base-bne** which is a RoBERTa base model and has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB.
45
+ It was trained by The [National Library of Spain (Biblioteca Nacional de España)](http://www.bne.es/en/Inicio/index.html)
46
+
47
+
48
+ **RoBERTa BNE Citation**
49
+ Check out the paper for all the details: https://arxiv.org/abs/2107.07253
50
+
51
+ ```
52
+ @article{gutierrezfandino2022,
53
+ author = {Asier Gutiérrez-Fandiño and Jordi Armengol-Estapé and Marc Pàmies and Joan Llop-Palao and Joaquin Silveira-Ocampo and Casimiro Pio Carrino and Carme Armentano-Oller and Carlos Rodriguez-Penagos and Aitor Gonzalez-Agirre and Marta Villegas},
54
+ title = {MarIA: Spanish Language Models},
55
+ journal = {Procesamiento del Lenguaje Natural},
56
+ volume = {68},
57
+ number = {0},
58
+ year = {2022},
59
+ issn = {1989-7553},
60
+ url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405},
61
+ pages = {39--60}
62
+ }
63
+ ```
64
+
65
+ ## Dataset
66
+ The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.
67
+
68
+ Sizes of datasets:
69
+ - Train dataset: 42,500
70
+ - Validation dataset: 3,750
71
+ - Test dataset: 3,750
72
+
73
+ ## Intended uses & limitations
74
+
75
+ This model is intented for Sentiment Analysis for spanish corpus and finetuned specially for movie reviews but it can be applied to other kind of reviews.
76
+
77
+ ## Hyperparameters
78
+ {
79
+ "epochs": "4",
80
+ "train_batch_size": "32",
81
+ "eval_batch_size": "8",
82
+ "fp16": "true",
83
+ "learning_rate": "3e-05",
84
+ "model_name": "\"PlanTL-GOB-ES/roberta-base-bne\"",
85
+ "sagemaker_container_log_level": "20",
86
+ "sagemaker_program": "\"train.py\"",
87
+ }
88
+
89
+ ## Evaluation results
90
+
91
+ - Accuracy = 0.9106666666666666
92
+
93
+ - F1 Score = 0.9090909090909091
94
+
95
+ - Precision = 0.9063852813852814
96
+
97
+ - Recall = 0.9118127381600436
98
+
99
+ ## Test results
100
+
101
+ ## Model in action
102
+
103
+ ### Usage for Sentiment Analysis
104
+
105
+ ```python
106
+ import torch
107
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
108
+
109
+ tokenizer = AutoTokenizer.from_pretrained("edumunozsala/roberta_bne_sentiment_analysis_es")
110
+ model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/roberta_bne_sentiment_analysis_es")
111
+
112
+ text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
113
+
114
+ input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
115
+ outputs = model(input_ids)
116
+ output = outputs.logits.argmax(1)
117
+ ```
118
+
119
+ Created by [Eduardo Muñoz/@edumunozsala](https://github.com/edumunozsala)