msft-regular-model

This model is a fine-tuned version of on the wikitext dataset. It achieves the following results on the evaluation set:

Loss: 5.3420

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss
9.1224	0.17	200	8.0736
7.5229	0.34	400	7.1536
7.0122	0.51	600	6.9072
6.8296	0.69	800	6.7582
6.709	0.86	1000	6.6436
6.5882	1.03	1200	6.5563
6.4807	1.2	1400	6.4784
6.4172	1.37	1600	6.4165
6.3403	1.54	1800	6.3555
6.2969	1.71	2000	6.3107
6.2346	1.89	2200	6.2691
6.1767	2.06	2400	6.2299
6.1326	2.23	2600	6.1937
6.1035	2.4	2800	6.1602
6.0624	2.57	3000	6.1241
6.0393	2.74	3200	6.0971
5.9982	2.91	3400	6.0656
5.9526	3.08	3600	6.0397
5.9086	3.26	3800	6.0104
5.8922	3.43	4000	5.9888
5.8631	3.6	4200	5.9661
5.8396	3.77	4400	5.9407
5.8055	3.94	4600	5.9177
5.7763	4.11	4800	5.9007
5.7314	4.28	5000	5.8834
5.7302	4.46	5200	5.8620
5.6987	4.63	5400	5.8451
5.6754	4.8	5600	5.8242
5.6571	4.97	5800	5.8059
5.615	5.14	6000	5.7871
5.596	5.31	6200	5.7817
5.5738	5.48	6400	5.7570
5.5641	5.66	6600	5.7431
5.5503	5.83	6800	5.7271
5.5214	6.0	7000	5.7108
5.4712	6.17	7200	5.7018
5.48	6.34	7400	5.6936
5.4527	6.51	7600	5.6812
5.4514	6.68	7800	5.6669
5.4454	6.86	8000	5.6509
5.399	7.03	8200	5.6408
5.3747	7.2	8400	5.6327
5.3667	7.37	8600	5.6197
5.3652	7.54	8800	5.6084
5.3394	7.71	9000	5.5968
5.3349	7.88	9200	5.5870
5.2994	8.05	9400	5.5826
5.2793	8.23	9600	5.5710
5.2716	8.4	9800	5.5623
5.275	8.57	10000	5.5492
5.264	8.74	10200	5.5449
5.241	8.91	10400	5.5322
5.2285	9.08	10600	5.5267
5.2021	9.25	10800	5.5187
5.1934	9.43	11000	5.5158
5.1737	9.6	11200	5.5044
5.1774	9.77	11400	5.5008
5.1841	9.94	11600	5.4960
5.1414	10.11	11800	5.4895
5.1491	10.28	12000	5.4849
5.1184	10.45	12200	5.4738
5.1136	10.63	12400	5.4690
5.1199	10.8	12600	5.4598
5.1056	10.97	12800	5.4536
5.0648	11.14	13000	5.4496
5.0598	11.31	13200	5.4449
5.0656	11.48	13400	5.4422
5.0664	11.65	13600	5.4367
5.0675	11.83	13800	5.4286
5.0459	12.0	14000	5.4249
5.0073	12.17	14200	5.4260
5.0229	12.34	14400	5.4175
5.0079	12.51	14600	5.4119
5.0	12.68	14800	5.4194
5.0094	12.85	15000	5.4068
4.9967	13.02	15200	5.3995
4.9541	13.2	15400	5.4002
4.9753	13.37	15600	5.3965
4.9732	13.54	15800	5.3925
4.9624	13.71	16000	5.3888
4.9559	13.88	16200	5.3824
4.9559	14.05	16400	5.3851
4.9109	14.22	16600	5.3815
4.9211	14.4	16800	5.3784
4.9342	14.57	17000	5.3735
4.9271	14.74	17200	5.3711
4.9328	14.91	17400	5.3646
4.8994	15.08	17600	5.3664
4.8932	15.25	17800	5.3642
4.8886	15.42	18000	5.3620
4.8997	15.6	18200	5.3584
4.8846	15.77	18400	5.3551
4.8993	15.94	18600	5.3516
4.8648	16.11	18800	5.3552
4.8838	16.28	19000	5.3512
4.8575	16.45	19200	5.3478
4.8623	16.62	19400	5.3480
4.8631	16.8	19600	5.3439
4.8576	16.97	19800	5.3428
4.8265	17.14	20000	5.3420
4.8523	17.31	20200	5.3410
4.8477	17.48	20400	5.3396
4.8507	17.65	20600	5.3380
4.8498	17.82	20800	5.3333
4.8261	17.99	21000	5.3342
4.8201	18.17	21200	5.3324
4.8214	18.34	21400	5.3341
4.8195	18.51	21600	5.3315
4.8216	18.68	21800	5.3335
4.8243	18.85	22000	5.3291
4.832	19.02	22200	5.3295
4.8085	19.19	22400	5.3309
4.8094	19.37	22600	5.3283
4.815	19.54	22800	5.3280
4.8219	19.71	23000	5.3270
4.8117	19.88	23200	5.3280

Framework versions

Transformers 4.13.0.dev0
Pytorch 1.10.0
Datasets 1.14.0
Tokenizers 0.10.3

mikaelsouza
/

msft-regular-model

msft-regular-model

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train mikaelsouza/msft-regular-model

Evaluation results