gpt-imdb-ipo_annealing

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
16.3187	0.21	500	34.0876	0.1161	-0.1126	0.5292	0.2287	-263.8062	-235.1407	-33.1877	-33.4371
5.5155	0.42	1000	13.0423	-0.1485	-0.3812	0.5042	0.2327	-264.1273	-235.4375	-35.2608	-35.4541
10.2532	0.63	1500	18.5157	-0.4407	-0.5471	0.5458	0.1064	-264.3746	-235.8205	-34.2230	-34.4246
6.755	0.83	2000	28.1593	-0.7791	-0.8052	0.5917	0.0261	-264.7961	-236.3400	-33.6119	-33.8069
9.4126	1.04	2500	9.2406	-0.8733	-1.2564	0.6229	0.3831	-265.6003	-236.5962	-31.9471	-32.0700
8.5908	1.25	3000	12.4967	-0.6700	-1.0163	0.6167	0.3462	-265.4156	-236.4061	-31.6914	-31.8443
19.5217	1.46	3500	6.8889	-0.0720	-0.4689	0.6854	0.3969	-264.5895	-235.4041	-32.1300	-32.2692
6.9195	1.67	4000	4.2435	-0.5324	-0.9335	0.7021	0.4012	-265.7609	-236.4489	-31.8342	-31.9606
4.6993	1.88	4500	5.0987	-0.2002	-0.6179	0.7521	0.4177	-265.3070	-235.7907	-31.6301	-31.7617
2.7896	2.08	5000	2.7344	-0.2390	-0.5589	0.7500	0.3199	-265.4754	-236.0307	-31.9650	-32.1009
3.2262	2.29	5500	3.0584	-0.1936	-0.5168	0.8083	0.3231	-265.8080	-236.0606	-31.6585	-31.8243
4.1965	2.5	6000	4.2350	-0.1555	-0.4440	0.8417	0.2884	-266.2272	-236.1557	-31.6484	-31.8344
15.1482	2.71	6500	10.8174	-0.0932	-0.3244	0.8667	0.2312	-266.7491	-236.1454	-31.4600	-31.6800
145.9251	2.92	7000	125.6974	-0.0343	-0.1277	0.875	0.0934	-267.1282	-236.1897	-31.3501	-31.5916