gpt-imdb-alpha_0.3-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.3872	0.21	500	0.9032	-0.0063	-0.4921	0.7833	0.4858	-268.6066	-235.3286	-32.2910	-32.9554
0.937	0.42	1000	0.5782	0.3739	-0.2273	0.7667	0.6012	-265.9586	-231.5264	-33.2571	-33.9060
1.6799	0.63	1500	3.1537	0.2527	-0.4167	0.7729	0.6694	-267.8524	-232.7385	-33.1089	-33.5974
0.8141	0.83	2000	1.8978	0.1800	-0.6646	0.7917	0.8446	-270.3312	-233.4657	-32.3310	-32.9275
0.4758	1.04	2500	7.5225	0.0635	-0.8693	0.8188	0.9329	-272.3785	-234.6298	-32.0571	-32.5700
0.5184	1.25	3000	2.2710	0.3736	-0.5136	0.8021	0.8872	-268.8213	-231.5289	-33.9791	-34.4883
0.3571	1.46	3500	12.0724	0.0389	-0.9119	0.8125	0.9507	-272.8040	-234.8766	-32.0986	-32.6149
1.8478	1.67	4000	14.8072	0.0021	-0.9754	0.8229	0.9775	-273.4396	-235.2442	-32.4363	-32.9745
0.6874	1.88	4500	5.9952	0.0487	-0.9284	0.8167	0.9771	-272.9694	-234.7781	-32.9101	-33.4694
0.2233	2.08	5000	11.0797	-0.2853	-1.2611	0.8479	0.9758	-276.2962	-238.1182	-31.8450	-32.3602
0.1784	2.29	5500	7.9899	-0.1567	-1.1325	0.8375	0.9757	-275.0099	-236.8327	-32.0292	-32.5741
0.2919	2.5	6000	29.0523	-0.3295	-1.3283	0.8500	0.9988	-276.9686	-238.5604	-31.4315	-31.9371
2.011	2.71	6500	28.3221	-0.2974	-1.3018	0.8458	1.0044	-276.7031	-238.2393	-31.6565	-32.1763
1.7899	2.92	7000	25.4567	-0.2859	-1.2893	0.8458	1.0034	-276.5780	-238.1245	-31.6823	-32.1973