metadata
tags:
- generated_from_trainer
datasets:
- wikitext
model-index:
- name: msft-regular-model
results: []
msft-regular-model
This model is a fine-tuned version of on the wikitext dataset. It achieves the following results on the evaluation set:
- Loss: 5.3420
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
9.1224 | 0.17 | 200 | 8.0736 |
7.5229 | 0.34 | 400 | 7.1536 |
7.0122 | 0.51 | 600 | 6.9072 |
6.8296 | 0.69 | 800 | 6.7582 |
6.709 | 0.86 | 1000 | 6.6436 |
6.5882 | 1.03 | 1200 | 6.5563 |
6.4807 | 1.2 | 1400 | 6.4784 |
6.4172 | 1.37 | 1600 | 6.4165 |
6.3403 | 1.54 | 1800 | 6.3555 |
6.2969 | 1.71 | 2000 | 6.3107 |
6.2346 | 1.89 | 2200 | 6.2691 |
6.1767 | 2.06 | 2400 | 6.2299 |
6.1326 | 2.23 | 2600 | 6.1937 |
6.1035 | 2.4 | 2800 | 6.1602 |
6.0624 | 2.57 | 3000 | 6.1241 |
6.0393 | 2.74 | 3200 | 6.0971 |
5.9982 | 2.91 | 3400 | 6.0656 |
5.9526 | 3.08 | 3600 | 6.0397 |
5.9086 | 3.26 | 3800 | 6.0104 |
5.8922 | 3.43 | 4000 | 5.9888 |
5.8631 | 3.6 | 4200 | 5.9661 |
5.8396 | 3.77 | 4400 | 5.9407 |
5.8055 | 3.94 | 4600 | 5.9177 |
5.7763 | 4.11 | 4800 | 5.9007 |
5.7314 | 4.28 | 5000 | 5.8834 |
5.7302 | 4.46 | 5200 | 5.8620 |
5.6987 | 4.63 | 5400 | 5.8451 |
5.6754 | 4.8 | 5600 | 5.8242 |
5.6571 | 4.97 | 5800 | 5.8059 |
5.615 | 5.14 | 6000 | 5.7871 |
5.596 | 5.31 | 6200 | 5.7817 |
5.5738 | 5.48 | 6400 | 5.7570 |
5.5641 | 5.66 | 6600 | 5.7431 |
5.5503 | 5.83 | 6800 | 5.7271 |
5.5214 | 6.0 | 7000 | 5.7108 |
5.4712 | 6.17 | 7200 | 5.7018 |
5.48 | 6.34 | 7400 | 5.6936 |
5.4527 | 6.51 | 7600 | 5.6812 |
5.4514 | 6.68 | 7800 | 5.6669 |
5.4454 | 6.86 | 8000 | 5.6509 |
5.399 | 7.03 | 8200 | 5.6408 |
5.3747 | 7.2 | 8400 | 5.6327 |
5.3667 | 7.37 | 8600 | 5.6197 |
5.3652 | 7.54 | 8800 | 5.6084 |
5.3394 | 7.71 | 9000 | 5.5968 |
5.3349 | 7.88 | 9200 | 5.5870 |
5.2994 | 8.05 | 9400 | 5.5826 |
5.2793 | 8.23 | 9600 | 5.5710 |
5.2716 | 8.4 | 9800 | 5.5623 |
5.275 | 8.57 | 10000 | 5.5492 |
5.264 | 8.74 | 10200 | 5.5449 |
5.241 | 8.91 | 10400 | 5.5322 |
5.2285 | 9.08 | 10600 | 5.5267 |
5.2021 | 9.25 | 10800 | 5.5187 |
5.1934 | 9.43 | 11000 | 5.5158 |
5.1737 | 9.6 | 11200 | 5.5044 |
5.1774 | 9.77 | 11400 | 5.5008 |
5.1841 | 9.94 | 11600 | 5.4960 |
5.1414 | 10.11 | 11800 | 5.4895 |
5.1491 | 10.28 | 12000 | 5.4849 |
5.1184 | 10.45 | 12200 | 5.4738 |
5.1136 | 10.63 | 12400 | 5.4690 |
5.1199 | 10.8 | 12600 | 5.4598 |
5.1056 | 10.97 | 12800 | 5.4536 |
5.0648 | 11.14 | 13000 | 5.4496 |
5.0598 | 11.31 | 13200 | 5.4449 |
5.0656 | 11.48 | 13400 | 5.4422 |
5.0664 | 11.65 | 13600 | 5.4367 |
5.0675 | 11.83 | 13800 | 5.4286 |
5.0459 | 12.0 | 14000 | 5.4249 |
5.0073 | 12.17 | 14200 | 5.4260 |
5.0229 | 12.34 | 14400 | 5.4175 |
5.0079 | 12.51 | 14600 | 5.4119 |
5.0 | 12.68 | 14800 | 5.4194 |
5.0094 | 12.85 | 15000 | 5.4068 |
4.9967 | 13.02 | 15200 | 5.3995 |
4.9541 | 13.2 | 15400 | 5.4002 |
4.9753 | 13.37 | 15600 | 5.3965 |
4.9732 | 13.54 | 15800 | 5.3925 |
4.9624 | 13.71 | 16000 | 5.3888 |
4.9559 | 13.88 | 16200 | 5.3824 |
4.9559 | 14.05 | 16400 | 5.3851 |
4.9109 | 14.22 | 16600 | 5.3815 |
4.9211 | 14.4 | 16800 | 5.3784 |
4.9342 | 14.57 | 17000 | 5.3735 |
4.9271 | 14.74 | 17200 | 5.3711 |
4.9328 | 14.91 | 17400 | 5.3646 |
4.8994 | 15.08 | 17600 | 5.3664 |
4.8932 | 15.25 | 17800 | 5.3642 |
4.8886 | 15.42 | 18000 | 5.3620 |
4.8997 | 15.6 | 18200 | 5.3584 |
4.8846 | 15.77 | 18400 | 5.3551 |
4.8993 | 15.94 | 18600 | 5.3516 |
4.8648 | 16.11 | 18800 | 5.3552 |
4.8838 | 16.28 | 19000 | 5.3512 |
4.8575 | 16.45 | 19200 | 5.3478 |
4.8623 | 16.62 | 19400 | 5.3480 |
4.8631 | 16.8 | 19600 | 5.3439 |
4.8576 | 16.97 | 19800 | 5.3428 |
4.8265 | 17.14 | 20000 | 5.3420 |
4.8523 | 17.31 | 20200 | 5.3410 |
4.8477 | 17.48 | 20400 | 5.3396 |
4.8507 | 17.65 | 20600 | 5.3380 |
4.8498 | 17.82 | 20800 | 5.3333 |
4.8261 | 17.99 | 21000 | 5.3342 |
4.8201 | 18.17 | 21200 | 5.3324 |
4.8214 | 18.34 | 21400 | 5.3341 |
4.8195 | 18.51 | 21600 | 5.3315 |
4.8216 | 18.68 | 21800 | 5.3335 |
4.8243 | 18.85 | 22000 | 5.3291 |
4.832 | 19.02 | 22200 | 5.3295 |
4.8085 | 19.19 | 22400 | 5.3309 |
4.8094 | 19.37 | 22600 | 5.3283 |
4.815 | 19.54 | 22800 | 5.3280 |
4.8219 | 19.71 | 23000 | 5.3270 |
4.8117 | 19.88 | 23200 | 5.3280 |
Framework versions
- Transformers 4.13.0.dev0
- Pytorch 1.10.0
- Datasets 1.14.0
- Tokenizers 0.10.3