w11wo commited on
Commit
4eb8f4e
·
1 Parent(s): 63f658b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -55
README.md CHANGED
@@ -1,39 +1,30 @@
1
  ---
 
2
  tags:
3
- - generated_from_trainer
 
4
  datasets:
5
- - oscar-corpus/OSCAR-2109
6
- model-index:
7
- - name: runs
8
- results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
 
14
- # runs
15
 
16
- This model was trained from scratch on the oscar-corpus/OSCAR-2109 deduplicated_lo dataset.
17
- It achieves the following results on the evaluation set:
18
- - Loss: 1.4556
19
 
20
- ## Model description
21
 
22
- More information needed
23
-
24
- ## Intended uses & limitations
25
-
26
- More information needed
27
-
28
- ## Training and evaluation data
29
-
30
- More information needed
31
 
32
  ## Training procedure
33
 
34
  ### Training hyperparameters
35
 
36
  The following hyperparameters were used during training:
 
37
  - learning_rate: 0.0002
38
  - train_batch_size: 128
39
  - eval_batch_size: 128
@@ -50,40 +41,80 @@ The following hyperparameters were used during training:
50
  ### Training results
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
- |:-------------:|:-----:|:----:|:---------------:|
54
- | No log | 1.0 | 216 | 5.8586 |
55
- | No log | 2.0 | 432 | 5.5095 |
56
- | 6.688 | 3.0 | 648 | 5.3976 |
57
- | 6.688 | 4.0 | 864 | 5.3562 |
58
- | 5.3629 | 5.0 | 1080 | 5.2912 |
59
- | 5.3629 | 6.0 | 1296 | 5.2385 |
60
- | 5.22 | 7.0 | 1512 | 5.1955 |
61
- | 5.22 | 8.0 | 1728 | 5.1785 |
62
- | 5.22 | 9.0 | 1944 | 5.1327 |
63
- | 5.1248 | 10.0 | 2160 | 5.1243 |
64
- | 5.1248 | 11.0 | 2376 | 5.0889 |
65
- | 5.0591 | 12.0 | 2592 | 5.0732 |
66
- | 5.0591 | 13.0 | 2808 | 5.0417 |
67
- | 5.0094 | 14.0 | 3024 | 5.0388 |
68
- | 5.0094 | 15.0 | 3240 | 4.9299 |
69
- | 5.0094 | 16.0 | 3456 | 4.2991 |
70
- | 4.7527 | 17.0 | 3672 | 3.6541 |
71
- | 4.7527 | 18.0 | 3888 | 2.7826 |
72
- | 3.4431 | 19.0 | 4104 | 2.2796 |
73
- | 3.4431 | 20.0 | 4320 | 2.0213 |
74
- | 2.2803 | 21.0 | 4536 | 1.8809 |
75
- | 2.2803 | 22.0 | 4752 | 1.7615 |
76
- | 2.2803 | 23.0 | 4968 | 1.6925 |
77
- | 1.8601 | 24.0 | 5184 | 1.6205 |
78
- | 1.8601 | 25.0 | 5400 | 1.5751 |
79
- | 1.6697 | 26.0 | 5616 | 1.5391 |
80
- | 1.6697 | 27.0 | 5832 | 1.5200 |
81
- | 1.5655 | 28.0 | 6048 | 1.4866 |
82
- | 1.5655 | 29.0 | 6264 | 1.4656 |
83
- | 1.5655 | 30.0 | 6480 | 1.4627 |
84
-
85
-
86
- ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
  - Transformers 4.13.0.dev0
89
  - Pytorch 1.9.0+cu102
 
1
  ---
2
+ language: lo
3
  tags:
4
+ - lao-roberta-base
5
+ license: mit
6
  datasets:
7
+ - oscar-corpus/OSCAR-2109
 
 
 
8
  ---
9
 
10
+ ## Lao RoBERTa Base
 
11
 
12
+ Lao RoBERTa Base is a masked language model based on the [RoBERTa](https://arxiv.org/abs/1907.11692) model. It was trained on the [OSCAR-2109](https://huggingface.co/datasets/oscar-corpus/OSCAR-2109) dataset, specifically the `deduplicated_lo` subset. The model was trained from scratch and achieved an evaluation loss of 1.4556 and an evaluation perplexity of 4.287.
13
 
14
+ This model was trained using HuggingFace's PyTorch framework and the training script found [here](https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_mlm.py). All training was done on a TPUv3-8, provided by the [TPU Research Cloud](https://sites.research.google/trc/about/) program. You can view the detailed training results in the [Training metrics](https://huggingface.co/w11wo/lao-roberta-base/tensorboard) tab, logged via Tensorboard.
 
 
15
 
16
+ ## Model
17
 
18
+ | Model | #params | Arch. | Training/Validation data (text) |
19
+ | ------------------ | ------- | ------- | ------------------------------------ |
20
+ | `lao-roberta-base` | 124M | RoBERTa | OSCAR-2109 `deduplicated_lo` Dataset |
 
 
 
 
 
 
21
 
22
  ## Training procedure
23
 
24
  ### Training hyperparameters
25
 
26
  The following hyperparameters were used during training:
27
+
28
  - learning_rate: 0.0002
29
  - train_batch_size: 128
30
  - eval_batch_size: 128
 
41
  ### Training results
42
 
43
  | Training Loss | Epoch | Step | Validation Loss |
44
+ | :-----------: | :---: | :--: | :-------------: |
45
+ | No log | 1.0 | 216 | 5.8586 |
46
+ | No log | 2.0 | 432 | 5.5095 |
47
+ | 6.688 | 3.0 | 648 | 5.3976 |
48
+ | 6.688 | 4.0 | 864 | 5.3562 |
49
+ | 5.3629 | 5.0 | 1080 | 5.2912 |
50
+ | 5.3629 | 6.0 | 1296 | 5.2385 |
51
+ | 5.22 | 7.0 | 1512 | 5.1955 |
52
+ | 5.22 | 8.0 | 1728 | 5.1785 |
53
+ | 5.22 | 9.0 | 1944 | 5.1327 |
54
+ | 5.1248 | 10.0 | 2160 | 5.1243 |
55
+ | 5.1248 | 11.0 | 2376 | 5.0889 |
56
+ | 5.0591 | 12.0 | 2592 | 5.0732 |
57
+ | 5.0591 | 13.0 | 2808 | 5.0417 |
58
+ | 5.0094 | 14.0 | 3024 | 5.0388 |
59
+ | 5.0094 | 15.0 | 3240 | 4.9299 |
60
+ | 5.0094 | 16.0 | 3456 | 4.2991 |
61
+ | 4.7527 | 17.0 | 3672 | 3.6541 |
62
+ | 4.7527 | 18.0 | 3888 | 2.7826 |
63
+ | 3.4431 | 19.0 | 4104 | 2.2796 |
64
+ | 3.4431 | 20.0 | 4320 | 2.0213 |
65
+ | 2.2803 | 21.0 | 4536 | 1.8809 |
66
+ | 2.2803 | 22.0 | 4752 | 1.7615 |
67
+ | 2.2803 | 23.0 | 4968 | 1.6925 |
68
+ | 1.8601 | 24.0 | 5184 | 1.6205 |
69
+ | 1.8601 | 25.0 | 5400 | 1.5751 |
70
+ | 1.6697 | 26.0 | 5616 | 1.5391 |
71
+ | 1.6697 | 27.0 | 5832 | 1.5200 |
72
+ | 1.5655 | 28.0 | 6048 | 1.4866 |
73
+ | 1.5655 | 29.0 | 6264 | 1.4656 |
74
+ | 1.5655 | 30.0 | 6480 | 1.4627 |
75
+
76
+ ## How to Use
77
+
78
+ ### As Masked Language Model
79
+
80
+ ```python
81
+ from transformers import pipeline
82
+
83
+ pretrained_name = "w11wo/lao-roberta-base"
84
+ prompt = "REPLACE WITH MASKED PROMPT"
85
+
86
+ fill_mask = pipeline(
87
+ "fill-mask",
88
+ model=pretrained_name,
89
+ tokenizer=pretrained_name
90
+ )
91
+
92
+ fill_mask(prompt)
93
+ ```
94
+
95
+ ### Feature Extraction in PyTorch
96
+
97
+ ```python
98
+ from transformers import RobertaModel, RobertaTokenizerFast
99
+
100
+ pretrained_name = "w11wo/lao-roberta-base"
101
+ model = RobertaModel.from_pretrained(pretrained_name)
102
+ tokenizer = RobertaTokenizerFast.from_pretrained(pretrained_name)
103
+
104
+ prompt = "ສະ​ບາຍ​ດີ​ຊາວ​ໂລກ."
105
+ encoded_input = tokenizer(prompt, return_tensors='pt')
106
+ output = model(**encoded_input)
107
+ ```
108
+
109
+ ## Disclaimer
110
+
111
+ Do consider the biases which came from pre-training datasets that may be carried over into the results of this model.
112
+
113
+ ## Author
114
+
115
+ Lao RoBERTa Base was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on Google's TPU-RC.
116
+
117
+ ## Framework versions
118
 
119
  - Transformers 4.13.0.dev0
120
  - Pytorch 1.9.0+cu102