mikaelsouza
commited on
Commit
•
0bec60c
1
Parent(s):
a8c73bc
update model card README.md
Browse files
README.md
ADDED
@@ -0,0 +1,172 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- generated_from_trainer
|
4 |
+
datasets:
|
5 |
+
- wikitext
|
6 |
+
model-index:
|
7 |
+
- name: msft-regular-model
|
8 |
+
results: []
|
9 |
+
---
|
10 |
+
|
11 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
12 |
+
should probably proofread and complete it, then remove this comment. -->
|
13 |
+
|
14 |
+
# msft-regular-model
|
15 |
+
|
16 |
+
This model is a fine-tuned version of [](https://huggingface.co/) on the wikitext dataset.
|
17 |
+
It achieves the following results on the evaluation set:
|
18 |
+
- Loss: 5.3420
|
19 |
+
|
20 |
+
## Model description
|
21 |
+
|
22 |
+
More information needed
|
23 |
+
|
24 |
+
## Intended uses & limitations
|
25 |
+
|
26 |
+
More information needed
|
27 |
+
|
28 |
+
## Training and evaluation data
|
29 |
+
|
30 |
+
More information needed
|
31 |
+
|
32 |
+
## Training procedure
|
33 |
+
|
34 |
+
### Training hyperparameters
|
35 |
+
|
36 |
+
The following hyperparameters were used during training:
|
37 |
+
- learning_rate: 5e-05
|
38 |
+
- train_batch_size: 16
|
39 |
+
- eval_batch_size: 16
|
40 |
+
- seed: 42
|
41 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
42 |
+
- lr_scheduler_type: linear
|
43 |
+
- num_epochs: 20
|
44 |
+
|
45 |
+
### Training results
|
46 |
+
|
47 |
+
| Training Loss | Epoch | Step | Validation Loss |
|
48 |
+
|:-------------:|:-----:|:-----:|:---------------:|
|
49 |
+
| 9.1224 | 0.17 | 200 | 8.0736 |
|
50 |
+
| 7.5229 | 0.34 | 400 | 7.1536 |
|
51 |
+
| 7.0122 | 0.51 | 600 | 6.9072 |
|
52 |
+
| 6.8296 | 0.69 | 800 | 6.7582 |
|
53 |
+
| 6.709 | 0.86 | 1000 | 6.6436 |
|
54 |
+
| 6.5882 | 1.03 | 1200 | 6.5563 |
|
55 |
+
| 6.4807 | 1.2 | 1400 | 6.4784 |
|
56 |
+
| 6.4172 | 1.37 | 1600 | 6.4165 |
|
57 |
+
| 6.3403 | 1.54 | 1800 | 6.3555 |
|
58 |
+
| 6.2969 | 1.71 | 2000 | 6.3107 |
|
59 |
+
| 6.2346 | 1.89 | 2200 | 6.2691 |
|
60 |
+
| 6.1767 | 2.06 | 2400 | 6.2299 |
|
61 |
+
| 6.1326 | 2.23 | 2600 | 6.1937 |
|
62 |
+
| 6.1035 | 2.4 | 2800 | 6.1602 |
|
63 |
+
| 6.0624 | 2.57 | 3000 | 6.1241 |
|
64 |
+
| 6.0393 | 2.74 | 3200 | 6.0971 |
|
65 |
+
| 5.9982 | 2.91 | 3400 | 6.0656 |
|
66 |
+
| 5.9526 | 3.08 | 3600 | 6.0397 |
|
67 |
+
| 5.9086 | 3.26 | 3800 | 6.0104 |
|
68 |
+
| 5.8922 | 3.43 | 4000 | 5.9888 |
|
69 |
+
| 5.8631 | 3.6 | 4200 | 5.9661 |
|
70 |
+
| 5.8396 | 3.77 | 4400 | 5.9407 |
|
71 |
+
| 5.8055 | 3.94 | 4600 | 5.9177 |
|
72 |
+
| 5.7763 | 4.11 | 4800 | 5.9007 |
|
73 |
+
| 5.7314 | 4.28 | 5000 | 5.8834 |
|
74 |
+
| 5.7302 | 4.46 | 5200 | 5.8620 |
|
75 |
+
| 5.6987 | 4.63 | 5400 | 5.8451 |
|
76 |
+
| 5.6754 | 4.8 | 5600 | 5.8242 |
|
77 |
+
| 5.6571 | 4.97 | 5800 | 5.8059 |
|
78 |
+
| 5.615 | 5.14 | 6000 | 5.7871 |
|
79 |
+
| 5.596 | 5.31 | 6200 | 5.7817 |
|
80 |
+
| 5.5738 | 5.48 | 6400 | 5.7570 |
|
81 |
+
| 5.5641 | 5.66 | 6600 | 5.7431 |
|
82 |
+
| 5.5503 | 5.83 | 6800 | 5.7271 |
|
83 |
+
| 5.5214 | 6.0 | 7000 | 5.7108 |
|
84 |
+
| 5.4712 | 6.17 | 7200 | 5.7018 |
|
85 |
+
| 5.48 | 6.34 | 7400 | 5.6936 |
|
86 |
+
| 5.4527 | 6.51 | 7600 | 5.6812 |
|
87 |
+
| 5.4514 | 6.68 | 7800 | 5.6669 |
|
88 |
+
| 5.4454 | 6.86 | 8000 | 5.6509 |
|
89 |
+
| 5.399 | 7.03 | 8200 | 5.6408 |
|
90 |
+
| 5.3747 | 7.2 | 8400 | 5.6327 |
|
91 |
+
| 5.3667 | 7.37 | 8600 | 5.6197 |
|
92 |
+
| 5.3652 | 7.54 | 8800 | 5.6084 |
|
93 |
+
| 5.3394 | 7.71 | 9000 | 5.5968 |
|
94 |
+
| 5.3349 | 7.88 | 9200 | 5.5870 |
|
95 |
+
| 5.2994 | 8.05 | 9400 | 5.5826 |
|
96 |
+
| 5.2793 | 8.23 | 9600 | 5.5710 |
|
97 |
+
| 5.2716 | 8.4 | 9800 | 5.5623 |
|
98 |
+
| 5.275 | 8.57 | 10000 | 5.5492 |
|
99 |
+
| 5.264 | 8.74 | 10200 | 5.5449 |
|
100 |
+
| 5.241 | 8.91 | 10400 | 5.5322 |
|
101 |
+
| 5.2285 | 9.08 | 10600 | 5.5267 |
|
102 |
+
| 5.2021 | 9.25 | 10800 | 5.5187 |
|
103 |
+
| 5.1934 | 9.43 | 11000 | 5.5158 |
|
104 |
+
| 5.1737 | 9.6 | 11200 | 5.5044 |
|
105 |
+
| 5.1774 | 9.77 | 11400 | 5.5008 |
|
106 |
+
| 5.1841 | 9.94 | 11600 | 5.4960 |
|
107 |
+
| 5.1414 | 10.11 | 11800 | 5.4895 |
|
108 |
+
| 5.1491 | 10.28 | 12000 | 5.4849 |
|
109 |
+
| 5.1184 | 10.45 | 12200 | 5.4738 |
|
110 |
+
| 5.1136 | 10.63 | 12400 | 5.4690 |
|
111 |
+
| 5.1199 | 10.8 | 12600 | 5.4598 |
|
112 |
+
| 5.1056 | 10.97 | 12800 | 5.4536 |
|
113 |
+
| 5.0648 | 11.14 | 13000 | 5.4496 |
|
114 |
+
| 5.0598 | 11.31 | 13200 | 5.4449 |
|
115 |
+
| 5.0656 | 11.48 | 13400 | 5.4422 |
|
116 |
+
| 5.0664 | 11.65 | 13600 | 5.4367 |
|
117 |
+
| 5.0675 | 11.83 | 13800 | 5.4286 |
|
118 |
+
| 5.0459 | 12.0 | 14000 | 5.4249 |
|
119 |
+
| 5.0073 | 12.17 | 14200 | 5.4260 |
|
120 |
+
| 5.0229 | 12.34 | 14400 | 5.4175 |
|
121 |
+
| 5.0079 | 12.51 | 14600 | 5.4119 |
|
122 |
+
| 5.0 | 12.68 | 14800 | 5.4194 |
|
123 |
+
| 5.0094 | 12.85 | 15000 | 5.4068 |
|
124 |
+
| 4.9967 | 13.02 | 15200 | 5.3995 |
|
125 |
+
| 4.9541 | 13.2 | 15400 | 5.4002 |
|
126 |
+
| 4.9753 | 13.37 | 15600 | 5.3965 |
|
127 |
+
| 4.9732 | 13.54 | 15800 | 5.3925 |
|
128 |
+
| 4.9624 | 13.71 | 16000 | 5.3888 |
|
129 |
+
| 4.9559 | 13.88 | 16200 | 5.3824 |
|
130 |
+
| 4.9559 | 14.05 | 16400 | 5.3851 |
|
131 |
+
| 4.9109 | 14.22 | 16600 | 5.3815 |
|
132 |
+
| 4.9211 | 14.4 | 16800 | 5.3784 |
|
133 |
+
| 4.9342 | 14.57 | 17000 | 5.3735 |
|
134 |
+
| 4.9271 | 14.74 | 17200 | 5.3711 |
|
135 |
+
| 4.9328 | 14.91 | 17400 | 5.3646 |
|
136 |
+
| 4.8994 | 15.08 | 17600 | 5.3664 |
|
137 |
+
| 4.8932 | 15.25 | 17800 | 5.3642 |
|
138 |
+
| 4.8886 | 15.42 | 18000 | 5.3620 |
|
139 |
+
| 4.8997 | 15.6 | 18200 | 5.3584 |
|
140 |
+
| 4.8846 | 15.77 | 18400 | 5.3551 |
|
141 |
+
| 4.8993 | 15.94 | 18600 | 5.3516 |
|
142 |
+
| 4.8648 | 16.11 | 18800 | 5.3552 |
|
143 |
+
| 4.8838 | 16.28 | 19000 | 5.3512 |
|
144 |
+
| 4.8575 | 16.45 | 19200 | 5.3478 |
|
145 |
+
| 4.8623 | 16.62 | 19400 | 5.3480 |
|
146 |
+
| 4.8631 | 16.8 | 19600 | 5.3439 |
|
147 |
+
| 4.8576 | 16.97 | 19800 | 5.3428 |
|
148 |
+
| 4.8265 | 17.14 | 20000 | 5.3420 |
|
149 |
+
| 4.8523 | 17.31 | 20200 | 5.3410 |
|
150 |
+
| 4.8477 | 17.48 | 20400 | 5.3396 |
|
151 |
+
| 4.8507 | 17.65 | 20600 | 5.3380 |
|
152 |
+
| 4.8498 | 17.82 | 20800 | 5.3333 |
|
153 |
+
| 4.8261 | 17.99 | 21000 | 5.3342 |
|
154 |
+
| 4.8201 | 18.17 | 21200 | 5.3324 |
|
155 |
+
| 4.8214 | 18.34 | 21400 | 5.3341 |
|
156 |
+
| 4.8195 | 18.51 | 21600 | 5.3315 |
|
157 |
+
| 4.8216 | 18.68 | 21800 | 5.3335 |
|
158 |
+
| 4.8243 | 18.85 | 22000 | 5.3291 |
|
159 |
+
| 4.832 | 19.02 | 22200 | 5.3295 |
|
160 |
+
| 4.8085 | 19.19 | 22400 | 5.3309 |
|
161 |
+
| 4.8094 | 19.37 | 22600 | 5.3283 |
|
162 |
+
| 4.815 | 19.54 | 22800 | 5.3280 |
|
163 |
+
| 4.8219 | 19.71 | 23000 | 5.3270 |
|
164 |
+
| 4.8117 | 19.88 | 23200 | 5.3280 |
|
165 |
+
|
166 |
+
|
167 |
+
### Framework versions
|
168 |
+
|
169 |
+
- Transformers 4.13.0.dev0
|
170 |
+
- Pytorch 1.10.0
|
171 |
+
- Datasets 1.14.0
|
172 |
+
- Tokenizers 0.10.3
|