gpt-imdb-ipo-beta_0.3

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
5.822	0.21	500	19.5830	-0.0268	-0.3320	0.6708	0.3052	-264.7920	-235.3544	-33.5002	-33.8198
6.8677	0.42	1000	18.7557	-0.0552	-0.3293	0.5917	0.2741	-264.7829	-235.4492	-35.5852	-35.8178
12.3698	0.63	1500	36.0453	-0.1426	-0.5467	0.6771	0.4041	-265.5075	-235.7406	-34.3816	-34.5936
7.8347	0.83	2000	38.2624	-0.0799	-0.3485	0.6500	0.2687	-264.8470	-235.5314	-33.2874	-33.4310
9.184	1.04	2500	14.9546	-0.3389	-0.7127	0.6875	0.3739	-266.0610	-236.3948	-32.7912	-32.9463
11.1603	1.25	3000	15.5236	-0.0513	-0.3736	0.7000	0.3223	-264.9306	-235.4362	-33.3399	-33.5624
16.5516	1.46	3500	8.6118	-0.1177	-0.5526	0.7438	0.4349	-265.5274	-235.6576	-31.9816	-32.1630
5.2761	1.67	4000	5.2168	-0.1495	-0.5364	0.7417	0.3869	-265.4733	-235.7637	-32.2719	-32.3991
2.9326	1.88	4500	4.2332	-0.2284	-0.6043	0.7646	0.3759	-265.6996	-236.0266	-32.0240	-32.1547
2.9814	2.08	5000	3.3498	-0.2188	-0.6063	0.7792	0.3874	-265.7062	-235.9947	-31.8376	-31.9728
1.8651	2.29	5500	2.8900	-0.2624	-0.6313	0.7896	0.3688	-265.7895	-236.1400	-31.4502	-31.5973
4.5849	2.5	6000	2.2055	-0.2771	-0.6338	0.7833	0.3567	-265.7979	-236.1888	-31.5011	-31.6468
1.7322	2.71	6500	1.9194	-0.2534	-0.6145	0.8208	0.3611	-265.7336	-236.1099	-31.6632	-31.8054
1.1697	2.92	7000	1.8601	-0.2473	-0.6141	0.8271	0.3668	-265.7321	-236.0896	-31.6527	-31.7977