gpt-imdb-sigmoid-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.2741	0.21	500	0.3546	-0.7644	-2.6310	0.8604	1.8666	-289.9951	-242.9089	-34.2705	-35.4568
0.3403	0.42	1000	0.2963	-1.6755	-4.3008	0.8687	2.6253	-306.6930	-252.0203	-40.9205	-42.3105
0.1939	0.63	1500	0.2596	-3.1297	-6.7295	0.8771	3.5998	-330.9802	-266.5624	-37.6829	-39.1821
0.2094	0.83	2000	0.1941	-2.9414	-6.9143	0.9292	3.9728	-332.8280	-264.6796	-38.0792	-39.7464
0.1481	1.04	2500	0.1744	-3.7473	-8.3469	0.9333	4.5996	-347.1542	-272.7383	-40.9252	-42.5164
0.2862	1.25	3000	0.1750	-4.5825	-9.7147	0.9292	5.1322	-360.8324	-281.0905	-41.9790	-44.0717
0.304	1.46	3500	0.1652	-4.3291	-9.8200	0.9333	5.4909	-361.8853	-278.5559	-44.1786	-46.1418
0.2167	1.67	4000	0.1580	-4.6175	-10.0305	0.9354	5.4130	-363.9903	-281.4398	-43.6324	-45.4854
0.1396	1.88	4500	0.1518	-4.5940	-10.1635	0.9396	5.5696	-365.3205	-281.2049	-41.9461	-43.8060
0.1575	2.08	5000	0.1525	-5.3119	-11.3685	0.9292	6.0566	-377.3703	-288.3840	-43.4045	-45.2127
0.0338	2.29	5500	0.1472	-5.2545	-11.3863	0.9333	6.1319	-377.5485	-287.8099	-43.2283	-45.1626
0.1631	2.5	6000	0.1496	-5.6862	-11.9852	0.9333	6.2991	-383.5375	-292.1269	-43.6007	-45.5693
0.1177	2.71	6500	0.1473	-5.6329	-11.9588	0.9417	6.3259	-383.2729	-291.5939	-44.3503	-46.3168
0.2342	2.92	7000	0.1445	-5.6156	-11.9139	0.9354	6.2982	-382.8238	-291.4216	-44.3728	-46.3321