Edit model card

msft-regular-model

This model is a fine-tuned version of on the wikitext dataset. It achieves the following results on the evaluation set:

  • Loss: 5.3420

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss
9.1224 0.17 200 8.0736
7.5229 0.34 400 7.1536
7.0122 0.51 600 6.9072
6.8296 0.69 800 6.7582
6.709 0.86 1000 6.6436
6.5882 1.03 1200 6.5563
6.4807 1.2 1400 6.4784
6.4172 1.37 1600 6.4165
6.3403 1.54 1800 6.3555
6.2969 1.71 2000 6.3107
6.2346 1.89 2200 6.2691
6.1767 2.06 2400 6.2299
6.1326 2.23 2600 6.1937
6.1035 2.4 2800 6.1602
6.0624 2.57 3000 6.1241
6.0393 2.74 3200 6.0971
5.9982 2.91 3400 6.0656
5.9526 3.08 3600 6.0397
5.9086 3.26 3800 6.0104
5.8922 3.43 4000 5.9888
5.8631 3.6 4200 5.9661
5.8396 3.77 4400 5.9407
5.8055 3.94 4600 5.9177
5.7763 4.11 4800 5.9007
5.7314 4.28 5000 5.8834
5.7302 4.46 5200 5.8620
5.6987 4.63 5400 5.8451
5.6754 4.8 5600 5.8242
5.6571 4.97 5800 5.8059
5.615 5.14 6000 5.7871
5.596 5.31 6200 5.7817
5.5738 5.48 6400 5.7570
5.5641 5.66 6600 5.7431
5.5503 5.83 6800 5.7271
5.5214 6.0 7000 5.7108
5.4712 6.17 7200 5.7018
5.48 6.34 7400 5.6936
5.4527 6.51 7600 5.6812
5.4514 6.68 7800 5.6669
5.4454 6.86 8000 5.6509
5.399 7.03 8200 5.6408
5.3747 7.2 8400 5.6327
5.3667 7.37 8600 5.6197
5.3652 7.54 8800 5.6084
5.3394 7.71 9000 5.5968
5.3349 7.88 9200 5.5870
5.2994 8.05 9400 5.5826
5.2793 8.23 9600 5.5710
5.2716 8.4 9800 5.5623
5.275 8.57 10000 5.5492
5.264 8.74 10200 5.5449
5.241 8.91 10400 5.5322
5.2285 9.08 10600 5.5267
5.2021 9.25 10800 5.5187
5.1934 9.43 11000 5.5158
5.1737 9.6 11200 5.5044
5.1774 9.77 11400 5.5008
5.1841 9.94 11600 5.4960
5.1414 10.11 11800 5.4895
5.1491 10.28 12000 5.4849
5.1184 10.45 12200 5.4738
5.1136 10.63 12400 5.4690
5.1199 10.8 12600 5.4598
5.1056 10.97 12800 5.4536
5.0648 11.14 13000 5.4496
5.0598 11.31 13200 5.4449
5.0656 11.48 13400 5.4422
5.0664 11.65 13600 5.4367
5.0675 11.83 13800 5.4286
5.0459 12.0 14000 5.4249
5.0073 12.17 14200 5.4260
5.0229 12.34 14400 5.4175
5.0079 12.51 14600 5.4119
5.0 12.68 14800 5.4194
5.0094 12.85 15000 5.4068
4.9967 13.02 15200 5.3995
4.9541 13.2 15400 5.4002
4.9753 13.37 15600 5.3965
4.9732 13.54 15800 5.3925
4.9624 13.71 16000 5.3888
4.9559 13.88 16200 5.3824
4.9559 14.05 16400 5.3851
4.9109 14.22 16600 5.3815
4.9211 14.4 16800 5.3784
4.9342 14.57 17000 5.3735
4.9271 14.74 17200 5.3711
4.9328 14.91 17400 5.3646
4.8994 15.08 17600 5.3664
4.8932 15.25 17800 5.3642
4.8886 15.42 18000 5.3620
4.8997 15.6 18200 5.3584
4.8846 15.77 18400 5.3551
4.8993 15.94 18600 5.3516
4.8648 16.11 18800 5.3552
4.8838 16.28 19000 5.3512
4.8575 16.45 19200 5.3478
4.8623 16.62 19400 5.3480
4.8631 16.8 19600 5.3439
4.8576 16.97 19800 5.3428
4.8265 17.14 20000 5.3420
4.8523 17.31 20200 5.3410
4.8477 17.48 20400 5.3396
4.8507 17.65 20600 5.3380
4.8498 17.82 20800 5.3333
4.8261 17.99 21000 5.3342
4.8201 18.17 21200 5.3324
4.8214 18.34 21400 5.3341
4.8195 18.51 21600 5.3315
4.8216 18.68 21800 5.3335
4.8243 18.85 22000 5.3291
4.832 19.02 22200 5.3295
4.8085 19.19 22400 5.3309
4.8094 19.37 22600 5.3283
4.815 19.54 22800 5.3280
4.8219 19.71 23000 5.3270
4.8117 19.88 23200 5.3280

Framework versions

  • Transformers 4.13.0.dev0
  • Pytorch 1.10.0
  • Datasets 1.14.0
  • Tokenizers 0.10.3
Downloads last month
9

Dataset used to train mikaelsouza/msft-regular-model