--- tags: - generated_from_trainer datasets: - wikitext model-index: - name: msft-regular-model results: [] --- # msft-regular-model This model is a fine-tuned version of [](https://huggingface.co/) on the wikitext dataset. It achieves the following results on the evaluation set: - Loss: 5.3420 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 20 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:| | 9.1224 | 0.17 | 200 | 8.0736 | | 7.5229 | 0.34 | 400 | 7.1536 | | 7.0122 | 0.51 | 600 | 6.9072 | | 6.8296 | 0.69 | 800 | 6.7582 | | 6.709 | 0.86 | 1000 | 6.6436 | | 6.5882 | 1.03 | 1200 | 6.5563 | | 6.4807 | 1.2 | 1400 | 6.4784 | | 6.4172 | 1.37 | 1600 | 6.4165 | | 6.3403 | 1.54 | 1800 | 6.3555 | | 6.2969 | 1.71 | 2000 | 6.3107 | | 6.2346 | 1.89 | 2200 | 6.2691 | | 6.1767 | 2.06 | 2400 | 6.2299 | | 6.1326 | 2.23 | 2600 | 6.1937 | | 6.1035 | 2.4 | 2800 | 6.1602 | | 6.0624 | 2.57 | 3000 | 6.1241 | | 6.0393 | 2.74 | 3200 | 6.0971 | | 5.9982 | 2.91 | 3400 | 6.0656 | | 5.9526 | 3.08 | 3600 | 6.0397 | | 5.9086 | 3.26 | 3800 | 6.0104 | | 5.8922 | 3.43 | 4000 | 5.9888 | | 5.8631 | 3.6 | 4200 | 5.9661 | | 5.8396 | 3.77 | 4400 | 5.9407 | | 5.8055 | 3.94 | 4600 | 5.9177 | | 5.7763 | 4.11 | 4800 | 5.9007 | | 5.7314 | 4.28 | 5000 | 5.8834 | | 5.7302 | 4.46 | 5200 | 5.8620 | | 5.6987 | 4.63 | 5400 | 5.8451 | | 5.6754 | 4.8 | 5600 | 5.8242 | | 5.6571 | 4.97 | 5800 | 5.8059 | | 5.615 | 5.14 | 6000 | 5.7871 | | 5.596 | 5.31 | 6200 | 5.7817 | | 5.5738 | 5.48 | 6400 | 5.7570 | | 5.5641 | 5.66 | 6600 | 5.7431 | | 5.5503 | 5.83 | 6800 | 5.7271 | | 5.5214 | 6.0 | 7000 | 5.7108 | | 5.4712 | 6.17 | 7200 | 5.7018 | | 5.48 | 6.34 | 7400 | 5.6936 | | 5.4527 | 6.51 | 7600 | 5.6812 | | 5.4514 | 6.68 | 7800 | 5.6669 | | 5.4454 | 6.86 | 8000 | 5.6509 | | 5.399 | 7.03 | 8200 | 5.6408 | | 5.3747 | 7.2 | 8400 | 5.6327 | | 5.3667 | 7.37 | 8600 | 5.6197 | | 5.3652 | 7.54 | 8800 | 5.6084 | | 5.3394 | 7.71 | 9000 | 5.5968 | | 5.3349 | 7.88 | 9200 | 5.5870 | | 5.2994 | 8.05 | 9400 | 5.5826 | | 5.2793 | 8.23 | 9600 | 5.5710 | | 5.2716 | 8.4 | 9800 | 5.5623 | | 5.275 | 8.57 | 10000 | 5.5492 | | 5.264 | 8.74 | 10200 | 5.5449 | | 5.241 | 8.91 | 10400 | 5.5322 | | 5.2285 | 9.08 | 10600 | 5.5267 | | 5.2021 | 9.25 | 10800 | 5.5187 | | 5.1934 | 9.43 | 11000 | 5.5158 | | 5.1737 | 9.6 | 11200 | 5.5044 | | 5.1774 | 9.77 | 11400 | 5.5008 | | 5.1841 | 9.94 | 11600 | 5.4960 | | 5.1414 | 10.11 | 11800 | 5.4895 | | 5.1491 | 10.28 | 12000 | 5.4849 | | 5.1184 | 10.45 | 12200 | 5.4738 | | 5.1136 | 10.63 | 12400 | 5.4690 | | 5.1199 | 10.8 | 12600 | 5.4598 | | 5.1056 | 10.97 | 12800 | 5.4536 | | 5.0648 | 11.14 | 13000 | 5.4496 | | 5.0598 | 11.31 | 13200 | 5.4449 | | 5.0656 | 11.48 | 13400 | 5.4422 | | 5.0664 | 11.65 | 13600 | 5.4367 | | 5.0675 | 11.83 | 13800 | 5.4286 | | 5.0459 | 12.0 | 14000 | 5.4249 | | 5.0073 | 12.17 | 14200 | 5.4260 | | 5.0229 | 12.34 | 14400 | 5.4175 | | 5.0079 | 12.51 | 14600 | 5.4119 | | 5.0 | 12.68 | 14800 | 5.4194 | | 5.0094 | 12.85 | 15000 | 5.4068 | | 4.9967 | 13.02 | 15200 | 5.3995 | | 4.9541 | 13.2 | 15400 | 5.4002 | | 4.9753 | 13.37 | 15600 | 5.3965 | | 4.9732 | 13.54 | 15800 | 5.3925 | | 4.9624 | 13.71 | 16000 | 5.3888 | | 4.9559 | 13.88 | 16200 | 5.3824 | | 4.9559 | 14.05 | 16400 | 5.3851 | | 4.9109 | 14.22 | 16600 | 5.3815 | | 4.9211 | 14.4 | 16800 | 5.3784 | | 4.9342 | 14.57 | 17000 | 5.3735 | | 4.9271 | 14.74 | 17200 | 5.3711 | | 4.9328 | 14.91 | 17400 | 5.3646 | | 4.8994 | 15.08 | 17600 | 5.3664 | | 4.8932 | 15.25 | 17800 | 5.3642 | | 4.8886 | 15.42 | 18000 | 5.3620 | | 4.8997 | 15.6 | 18200 | 5.3584 | | 4.8846 | 15.77 | 18400 | 5.3551 | | 4.8993 | 15.94 | 18600 | 5.3516 | | 4.8648 | 16.11 | 18800 | 5.3552 | | 4.8838 | 16.28 | 19000 | 5.3512 | | 4.8575 | 16.45 | 19200 | 5.3478 | | 4.8623 | 16.62 | 19400 | 5.3480 | | 4.8631 | 16.8 | 19600 | 5.3439 | | 4.8576 | 16.97 | 19800 | 5.3428 | | 4.8265 | 17.14 | 20000 | 5.3420 | | 4.8523 | 17.31 | 20200 | 5.3410 | | 4.8477 | 17.48 | 20400 | 5.3396 | | 4.8507 | 17.65 | 20600 | 5.3380 | | 4.8498 | 17.82 | 20800 | 5.3333 | | 4.8261 | 17.99 | 21000 | 5.3342 | | 4.8201 | 18.17 | 21200 | 5.3324 | | 4.8214 | 18.34 | 21400 | 5.3341 | | 4.8195 | 18.51 | 21600 | 5.3315 | | 4.8216 | 18.68 | 21800 | 5.3335 | | 4.8243 | 18.85 | 22000 | 5.3291 | | 4.832 | 19.02 | 22200 | 5.3295 | | 4.8085 | 19.19 | 22400 | 5.3309 | | 4.8094 | 19.37 | 22600 | 5.3283 | | 4.815 | 19.54 | 22800 | 5.3280 | | 4.8219 | 19.71 | 23000 | 5.3270 | | 4.8117 | 19.88 | 23200 | 5.3280 | ### Framework versions - Transformers 4.13.0.dev0 - Pytorch 1.10.0 - Datasets 1.14.0 - Tokenizers 0.10.3