pretraining7

This model is a fine-tuned version of gpt2 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
9.1748	0.1830	50	7.4163
6.7378	0.3660	100	6.2275
5.9659	0.5490	150	5.6559
5.4898	0.7321	200	5.2576
5.1248	0.9151	250	4.9392
4.7951	1.0981	300	4.6297
4.4843	1.2811	350	4.3591
4.2178	1.4641	400	4.0591
3.9729	1.6471	450	3.7843
3.766	1.8302	500	3.6225
3.6046	2.0132	550	3.5064
3.41	2.1962	600	3.4262
3.3702	2.3792	650	3.3577
3.309	2.5622	700	3.3027
3.2562	2.7452	750	3.2583
3.2027	2.9283	800	3.2192
3.1139	3.1113	850	3.1779
3.0442	3.2943	900	3.1549
3.0144	3.4773	950	3.1266
3.0016	3.6603	1000	3.0997
3.0001	3.8433	1050	3.0770
2.9655	4.0264	1100	3.0554
2.8328	4.2094	1150	3.0422
2.8343	4.3924	1200	3.0261
2.8266	4.5754	1250	3.0105
2.8236	4.7584	1300	2.9962
2.8194	4.9414	1350	2.9807
2.7161	5.1245	1400	2.9717
2.6842	5.3075	1450	2.9632
2.6898	5.4905	1500	2.9516