gpt-imdb-kto-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.2522	0.21	500	0.2884	1.2801	-0.2634	0.7875	1.5434	-266.3188	-222.4644	-37.5496	-38.4713
0.3335	0.42	1000	0.2696	1.5869	-0.1616	0.7917	1.7485	-265.3008	-219.3961	-37.8624	-38.7817
0.2435	0.63	1500	0.2472	1.8228	-0.2033	0.7896	2.0260	-265.7180	-217.0376	-33.3680	-34.1467
0.3162	0.83	2000	0.2497	2.2013	0.3606	0.7729	1.8407	-260.0789	-213.2520	-33.0705	-33.7146
0.1409	1.04	2500	0.2301	2.0789	-0.0950	0.8042	2.1738	-264.6351	-214.4766	-34.2110	-35.1256
0.2415	1.25	3000	0.2221	2.1406	-0.2423	0.8042	2.3829	-266.1087	-213.8594	-35.0880	-35.8295
0.1549	1.46	3500	0.2173	2.2945	-0.0445	0.7979	2.3390	-264.1307	-212.3203	-31.2025	-32.0702
0.1764	1.67	4000	0.2117	2.3347	-0.2551	0.8250	2.5898	-266.2365	-211.9187	-31.0530	-31.9754
0.131	1.88	4500	0.2101	2.3080	-0.3171	0.8062	2.6251	-266.8560	-212.1852	-30.9535	-31.9058
0.2463	2.08	5000	0.2131	2.5808	0.2215	0.8167	2.3593	-261.4699	-209.4572	-31.7099	-32.5262
0.1536	2.29	5500	0.2084	2.5201	-0.0034	0.8125	2.5236	-263.7196	-210.0640	-30.3275	-31.2806
0.2473	2.5	6000	0.2057	2.4813	-0.1087	0.8188	2.5899	-264.7721	-210.4527	-30.2259	-31.1935
0.2168	2.71	6500	0.2060	2.5255	-0.0304	0.8146	2.5559	-263.9893	-210.0102	-30.4678	-31.4146
0.1669	2.92	7000	0.2062	2.5179	-0.0433	0.8250	2.5611	-264.1180	-210.0866	-30.4371	-31.3849