edumunozsala commited on
Commit
dcc378b
1 Parent(s): a8c160a

Upload README.md

Browse files

More content and descriptios

Files changed (1) hide show
  1. README.md +96 -84
README.md CHANGED
@@ -1,84 +1,96 @@
1
- ---
2
- language: es
3
-
4
- tags:
5
- - sagemaker
6
- - roberta
7
- - ruperta
8
- - TextClassification
9
-
10
- license: apache-2.0
11
-
12
- datasets:
13
- - IMDbreviews_es
14
-
15
- model-index:
16
- - name: RuPERTa_base_sentiment_analysis_es
17
-
18
- results:
19
- - task:
20
-
21
- name: Sentiment Analysis
22
-
23
- type: sentiment-analysis
24
- - dataset:
25
-
26
- name: "IMDb Reviews in Spanish"
27
-
28
- type: IMDbreviews_es
29
-
30
- - metrics:
31
- - name: Accuracy,
32
- type: accuracy,
33
- value: 0.881866
34
-
35
- - name: F1 Score,
36
- type: f1,
37
- value: 0.008272
38
-
39
- - name: Precision,
40
- type: precision,
41
- value: 0.858605
42
-
43
- - name: Recall,
44
- type: recall,
45
- value: 0.920062
46
-
47
- ## `RuPERTa_base_sentiment_analysis_es`
48
-
49
- This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container.
50
-
51
- The base model is RuPERTa-base (uncased) which is a RoBERTa model trained on a uncased version of big Spanish corpus.
52
- It was trained by mrm8488, Manuel Romero. It is fine-tuned for a sentiment analysis task.
53
-
54
- ## Hyperparameters
55
-
56
- {
57
- "epochs": "4",
58
- "eval_batch_size": "8",
59
- "fp16": "true",
60
- "learning_rate": "3e-05",
61
- "model_name": "\"mrm8488/RuPERTa-base\"",
62
- "sagemaker_container_log_level": "20",
63
- "sagemaker_job_name": "\"ruperta-sentiment-analysis-full-p2-2021-12-06-20-32-27\"",
64
- "sagemaker_program": "\"train.py\"",
65
- "sagemaker_region": "\"us-east-1\"",
66
- "sagemaker_submit_directory": "\"s3://edumunozsala-ml-sagemaker/ruperta-sentiment/ruperta-sentiment-analysis-full-p2-2021-12-06-20-32-27/source/sourcedir.tar.gz\"",
67
- "train_batch_size": "32",
68
- "train_filename": "\"train_data.pt\"",
69
- "val_filename": "\"val_data.pt\""
70
- }
71
-
72
- ## Evaluation results
73
-
74
- epoch = 1.0
75
-
76
- eval_accuracy = 0.8629333333333333
77
-
78
- eval_f1 = 0.8648790746582545
79
-
80
- eval_loss = 0.3160930573940277
81
-
82
- eval_precision = 0.8479381443298969
83
-
84
- eval_recall = 0.8825107296137339
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ tags:
4
+ - sagemaker
5
+ - ruperta
6
+ - TextClassification
7
+ - SentimentAnalysis
8
+ license: apache-2.0
9
+ datasets:
10
+ - IMDbreviews_es
11
+ model-index:
12
+ name: RuPERTa_base_sentiment_analysis_es
13
+ results:
14
+ - task:
15
+ name: Sentiment Analysis
16
+ type: sentiment-analysis
17
+ - dataset:
18
+ name: "IMDb Reviews in Spanish"
19
+ type: IMDbreviews_es
20
+ - metrics:
21
+ - name: Accuracy,
22
+ type: accuracy,
23
+ value: 0.881866
24
+ - name: F1 Score,
25
+ type: f1,
26
+ value: 0.008272
27
+ - name: Precision,
28
+ type: precision,
29
+ value: 0.858605
30
+ - name: Recall,
31
+ type: recall,
32
+ value: 0.920062
33
+ widget:
34
+ - text: "Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
35
+ ---
36
+
37
+ ## Model `RuPERTa_base_sentiment_analysis_es`
38
+
39
+ ### **A finetuned model for Sentiment analysis in Spanish**
40
+
41
+ This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container,
42
+ The base model is **RuPERTa-base (uncased)** which is a RoBERTa model trained on a uncased version of big Spanish corpus.
43
+ It was trained by mrm8488, Manuel Romero.[Link to base model](https://huggingface.co/mrm8488/RuPERTa-base)
44
+
45
+ ## Dataset
46
+ The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.
47
+
48
+ Sizes of datasets:
49
+ - Train dataset: 42,500
50
+ - Validation dataset: 3,750
51
+ - Test dataset: 3,750
52
+
53
+
54
+ ## Hyperparameters
55
+ {
56
+ "epochs": "4",
57
+ "train_batch_size": "32",
58
+ "eval_batch_size": "8",
59
+ "fp16": "true",
60
+ "learning_rate": "3e-05",
61
+ "model_name": "\"mrm8488/RuPERTa-base\"",
62
+ "sagemaker_container_log_level": "20",
63
+ "sagemaker_program": "\"train.py\"",
64
+ }
65
+
66
+ ## Evaluation results
67
+ Accuracy = 0.8629333333333333
68
+ F1 Score = 0.8648790746582545
69
+ Precision = 0.8479381443298969
70
+ Recall = 0.8825107296137339
71
+
72
+ ## Test results
73
+ Accuracy = 0.8066666666666666
74
+ F1 Score = 0.8057862309134743
75
+ Precision = 0.7928307854507116
76
+ Recall = 0.8191721132897604
77
+
78
+ ## Model in action
79
+
80
+ ### Usage for Sentiment Analysis
81
+
82
+ ```python
83
+ import torch
84
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
85
+
86
+ tokenizer = AutoTokenizer.from_pretrained("edumunozsala/RuPERTa_base_sentiment_analysis_es")
87
+ model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/RuPERTa_base_sentiment_analysis_es")
88
+
89
+ text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
90
+
91
+ input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
92
+ outputs = model(input_ids)
93
+ output = outputs.logits.argmax(1)
94
+ ```
95
+
96
+ Created by [Eduardo Muñoz/@edumunozsala](https://github.com/edumunozsala)