w11wo commited on
Commit
167ffe8
1 Parent(s): 80def33

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -42
README.md CHANGED
@@ -1,69 +1,105 @@
1
  ---
2
- license: mit
3
  tags:
4
- - generated_from_trainer
5
- metrics:
6
- - accuracy
7
- - f1
8
- - precision
9
- - recall
10
- model-index:
11
- - name: indonesian-roberta-base-prdect-id
12
- results: []
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
 
18
- # indonesian-roberta-base-prdect-id
19
 
20
- This model is a fine-tuned version of [flax-community/indonesian-roberta-base](https://huggingface.co/flax-community/indonesian-roberta-base) on the None dataset.
21
- It achieves the following results on the evaluation set:
22
- - Loss: 0.8133
23
- - Accuracy: 0.6852
24
- - F1: 0.6447
25
- - Precision: 0.6464
26
- - Recall: 0.6437
27
 
28
- ## Model description
29
 
30
- More information needed
 
 
31
 
32
- ## Intended uses & limitations
33
 
34
- More information needed
35
 
36
- ## Training and evaluation data
37
-
38
- More information needed
39
 
40
  ## Training procedure
41
 
42
  ### Training hyperparameters
43
 
44
  The following hyperparameters were used during training:
45
- - learning_rate: 2e-05
46
- - train_batch_size: 32
47
- - eval_batch_size: 32
48
- - seed: 42
49
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
- - lr_scheduler_type: linear
51
- - num_epochs: 5
 
52
 
53
  ### Training results
54
 
55
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
56
- |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
57
- | 1.0358 | 1.0 | 152 | 0.8293 | 0.6519 | 0.5814 | 0.6399 | 0.5746 |
58
- | 0.7012 | 2.0 | 304 | 0.7444 | 0.6741 | 0.6269 | 0.6360 | 0.6220 |
59
- | 0.5599 | 3.0 | 456 | 0.7635 | 0.6852 | 0.6440 | 0.6433 | 0.6453 |
60
- | 0.4628 | 4.0 | 608 | 0.8031 | 0.6852 | 0.6421 | 0.6471 | 0.6396 |
61
- | 0.4027 | 5.0 | 760 | 0.8133 | 0.6852 | 0.6447 | 0.6464 | 0.6437 |
 
 
 
 
 
 
 
 
 
62
 
 
 
 
 
 
63
 
64
- ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
65
 
66
  - Transformers 4.24.0
67
  - Pytorch 1.12.1+cu113
68
  - Datasets 2.7.1
69
  - Tokenizers 0.13.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: id
3
  tags:
4
+ - indonesian-roberta-base-prdect-id
5
+ license: apache-2.0
6
+ datasets:
7
+ - prdect-id
8
+ widget:
9
+ - text: "Wah, kualitas produk ini sangat bagus!"
 
 
 
10
  ---
11
 
12
+ ## Indonesian RoBERTa Base PRDECT-ID
 
13
 
14
+ Indonesian RoBERTa Base PRDECT-ID is a emotion text-classification model based on the [RoBERTa](https://arxiv.org/abs/1907.11692) model. The model was originally the pre-trained [Indonesian RoBERTa Base](https://hf.co/flax-community/indonesian-roberta-base) model, which is then fine-tuned on the [`PRDECT-ID`](https://doi.org/10.1016/j.dib.2022.108554) dataset consisting of Indonesian product reviews (Sutoyo et al., 2022).
15
 
16
+ This model was trained using HuggingFace's PyTorch framework. All training was done on a NVIDIA T4, provided by Google Colaboratory. [Training metrics](https://huggingface.co/w11wo/indonesian-roberta-base-prdect-id/tensorboard) were logged via Tensorboard.
 
 
 
 
 
 
17
 
18
+ ## Model
19
 
20
+ | Model | #params | Arch. | Training/Validation data (text) |
21
+ | ----------------------------------- | ------- | ------------ | ------------------------------- |
22
+ | `indonesian-roberta-base-prdect-id` | 124M | RoBERTa Base | `PRDECT-ID` |
23
 
24
+ ## Evaluation Results
25
 
26
+ The model achieves the following results on evaluation:
27
 
28
+ | Dataset | Accuracy | F1 | Precision | Recall |
29
+ | ----------- | -------- | -------- | --------- | -------- |
30
+ | `PRDECT-ID` | 0.685185 | 0.644750 | 0.646400 | 0.643710 |
31
 
32
  ## Training procedure
33
 
34
  ### Training hyperparameters
35
 
36
  The following hyperparameters were used during training:
37
+
38
+ - `learning_rate`: 2e-05
39
+ - `train_batch_size`: 32
40
+ - `eval_batch_size`: 32
41
+ - `seed`: 42
42
+ - `optimizer`: Adam with `betas=(0.9,0.999)` and `epsilon=1e-08`
43
+ - `lr_scheduler_type`: linear
44
+ - `num_epochs`: 5
45
 
46
  ### Training results
47
 
48
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
49
+ | :-----------: | :---: | :---: | :-------------: | :------: | :----: | :-------: | :----: |
50
+ | 1.0358 | 1.0 | 152 | 0.8293 | 0.6519 | 0.5814 | 0.6399 | 0.5746 |
51
+ | 0.7012 | 2.0 | 304 | 0.7444 | 0.6741 | 0.6269 | 0.6360 | 0.6220 |
52
+ | 0.5599 | 3.0 | 456 | 0.7635 | 0.6852 | 0.6440 | 0.6433 | 0.6453 |
53
+ | 0.4628 | 4.0 | 608 | 0.8031 | 0.6852 | 0.6421 | 0.6471 | 0.6396 |
54
+ | 0.4027 | 5.0 | 760 | 0.8133 | 0.6852 | 0.6447 | 0.6464 | 0.6437 |
55
+
56
+ ## How to Use
57
+
58
+ ### As Text Classifier
59
+
60
+ ```python
61
+ from transformers import pipeline
62
+
63
+ pretrained_name = "w11wo/indonesian-roberta-base-prdect-id"
64
 
65
+ nlp = pipeline(
66
+ "sentiment-analysis",
67
+ model=pretrained_name,
68
+ tokenizer=pretrained_name
69
+ )
70
 
71
+ nlp("Wah, kualitas produk ini sangat bagus!")
72
+ ```
73
+
74
+ ## Disclaimer
75
+
76
+ Do consider the biases which come from both the pre-trained RoBERTa model and the `PRDECT-ID` dataset that may be carried over into the results of this model.
77
+
78
+ ## Author
79
+
80
+ Indonesian RoBERTa Base PRDECT-ID was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on Google Colaboratory using their free GPU access.
81
+
82
+ ## Framework versions
83
 
84
  - Transformers 4.24.0
85
  - Pytorch 1.12.1+cu113
86
  - Datasets 2.7.1
87
  - Tokenizers 0.13.2
88
+
89
+ ## References
90
+
91
+ ```bib
92
+ @article{SUTOYO2022108554,
93
+ title = {PRDECT-ID: Indonesian product reviews dataset for emotions classification tasks},
94
+ journal = {Data in Brief},
95
+ volume = {44},
96
+ pages = {108554},
97
+ year = {2022},
98
+ issn = {2352-3409},
99
+ doi = {https://doi.org/10.1016/j.dib.2022.108554},
100
+ url = {https://www.sciencedirect.com/science/article/pii/S2352340922007612},
101
+ author = {Rhio Sutoyo and Said Achmad and Andry Chowanda and Esther Widhi Andangsari and Sani M. Isa},
102
+ keywords = {Natural language processing, Text processing, Text mining, Emotions classification, Sentiment analysis},
103
+ abstract = {Recognizing emotions is vital in communication. Emotions convey additional meanings to the communication process. Nowadays, people can communicate their emotions on many platforms; one is the product review. Product reviews in the online platform are an important element that affects customers’ buying decisions. Hence, it is essential to recognize emotions from the product reviews. Emotions recognition from the product reviews can be done automatically using a machine or deep learning algorithm. Dataset can be considered as the fuel to model the recognizer. However, only a limited dataset exists in recognizing emotions from the product reviews, particularly in a local language. This research contributes to the dataset collection of 5400 product reviews in Indonesian. It was carefully curated from various (29) product categories, annotated with five emotions, and verified by an expert in clinical psychology. The dataset supports an innovative process to build automatic emotion classification on product reviews.}
104
+ }
105
+ ```