pretraining6

This model is a fine-tuned version of gpt2 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
7.5942	0.1830	50	6.9837
6.7139	0.3660	100	6.4890
6.3586	0.5490	150	6.1468
6.0068	0.7321	200	5.8162
5.7156	0.9151	250	5.5687
5.4902	1.0981	300	5.3574
5.2778	1.2811	350	5.1752
5.078	1.4641	400	4.9877
4.905	1.6471	450	4.8093
4.7396	1.8302	500	4.6300
4.5488	2.0132	550	4.4533
4.2909	2.1962	600	4.2386
4.1235	2.3792	650	3.9890
3.9081	2.5622	700	3.7933
3.7373	2.7452	750	3.6421
3.6011	2.9283	800	3.5265
3.4526	3.1113	850	3.4465
3.3523	3.2943	900	3.3867
3.2917	3.4773	950	3.3297
3.2536	3.6603	1000	3.2808
3.2277	3.8433	1050	3.2435
3.1699	4.0264	1100	3.1971
3.0158	4.2094	1150	3.1710
3.0104	4.3924	1200	3.1499
2.9946	4.5754	1250	3.1194
2.9814	4.7584	1300	3.0988
2.9686	4.9414	1350	3.0700
2.8425	5.1245	1400	3.0559
2.8039	5.3075	1450	3.0437
2.8121	5.4905	1500	3.0285
2.8078	5.6735	1550	3.0128
2.7996	5.8565	1600	2.9962
2.7607	6.0395	1650	2.9871
2.6212	6.2225	1700	2.9845
2.6638	6.4056	1750	2.9746
2.6603	6.5886	1800	2.9660
2.6674	6.7716	1850	2.9510
2.6741	6.9546	1900	2.9379
2.5313	7.1376	1950	2.9474
2.5107	7.3206	2000	2.9465
2.5358	7.5037	2050	2.9403
2.5552	7.6867	2100	2.9303
2.5691	7.8697	2150	2.9200
2.5008	8.0527	2200	2.9241
2.3855	8.2357	2250	2.9314
2.4215	8.4187	2300	2.9285
2.4488	8.6018	2350	2.9217
2.46	8.7848	2400	2.9110
2.468	8.9678	2450	2.9044
2.3004	9.1508	2500	2.9244