gpt-imdb-hinge-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.3746	0.21	500	0.3940	-0.4768	-1.9553	0.8562	1.4785	-283.2387	-240.0334	-33.1236	-34.2065
0.3627	0.42	1000	0.3395	-1.0759	-2.9896	0.8646	1.9137	-293.5812	-246.0238	-41.8545	-42.9940
0.2687	0.63	1500	0.3229	-1.7235	-4.1025	0.8729	2.3790	-304.7103	-252.5004	-39.8423	-41.2043
0.1878	0.83	2000	0.2360	-1.6708	-4.3940	0.9104	2.7231	-307.6249	-251.9736	-41.4970	-42.6933
0.1936	1.04	2500	0.2124	-1.9623	-4.8688	0.9250	2.9066	-312.3736	-254.8880	-42.8807	-43.9675
0.2302	1.25	3000	0.2062	-2.1959	-5.2559	0.9021	3.0600	-316.2442	-257.2241	-45.2090	-46.3997
0.2137	1.46	3500	0.2235	-2.1054	-5.4204	0.9208	3.3150	-317.8889	-256.3190	-46.5366	-47.7024
0.2231	1.67	4000	0.1884	-2.3281	-5.6096	0.9208	3.2815	-319.7815	-258.5467	-45.7720	-46.8600
0.2269	1.88	4500	0.1785	-2.5145	-6.0015	0.9292	3.4871	-323.7006	-260.4101	-45.7220	-46.8746
0.1831	2.08	5000	0.1727	-2.6850	-6.2801	0.9312	3.5951	-326.4862	-262.1152	-45.0514	-46.1610
0.0112	2.29	5500	0.1682	-2.5613	-6.0913	0.9312	3.5300	-324.5987	-260.8782	-45.3410	-46.5522
0.1894	2.5	6000	0.1706	-2.7334	-6.3632	0.9271	3.6298	-327.3174	-262.5995	-45.2020	-46.4449
0.13	2.71	6500	0.1685	-2.7681	-6.4203	0.9250	3.6522	-327.8886	-262.9462	-45.5580	-46.8017
0.2717	2.92	7000	0.1683	-2.7548	-6.4029	0.9271	3.6481	-327.7139	-262.8134	-45.7026	-46.9404