gpt-imdb-alpha_0.5-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.4827	0.21	500	1.0040	0.1562	-0.2070	0.7042	0.3632	-265.7552	-233.7028	-33.0609	-33.6065
14.1335	0.42	1000	3.0758	0.3762	-0.1852	0.7438	0.5615	-265.5375	-231.5030	-34.4930	-35.0582
0.5469	0.63	1500	8.0814	0.4345	-0.1070	0.7271	0.5415	-264.7556	-230.9207	-32.8344	-33.3794
1.032	0.83	2000	4.5711	0.5518	-0.0259	0.7104	0.5777	-263.9442	-229.7469	-33.4042	-33.9772
0.3719	1.04	2500	459.9075	0.2914	-0.4286	0.7813	0.7200	-267.9716	-232.3516	-33.0798	-33.6079
0.4085	1.25	3000	526.3080	0.4340	-0.2325	0.7479	0.6666	-266.0108	-230.9248	-35.2424	-35.7675
2.1291	1.46	3500	630.5800	0.4165	-0.3073	0.7604	0.7238	-266.7581	-231.0998	-37.0077	-37.6012
4.7118	1.67	4000	96.2745	0.3115	-0.4479	0.7625	0.7593	-268.1639	-232.1506	-37.1158	-37.6120
0.5195	1.88	4500	342.8383	0.3188	-0.4079	0.7688	0.7267	-267.7646	-232.0775	-37.3006	-37.8729
0.8474	2.08	5000	4552.9634	-0.0725	-0.8330	0.7896	0.7605	-272.0149	-235.9899	-36.5234	-37.0376
0.2874	2.29	5500	3540.6086	0.0246	-0.7477	0.8083	0.7723	-271.1625	-235.0193	-36.0173	-36.5541
2.4701	2.5	6000	4522.3066	-0.0217	-0.7825	0.8042	0.7608	-271.5105	-235.4827	-35.7649	-36.2731
0.59	2.71	6500	4948.8481	0.0070	-0.7472	0.8104	0.7542	-271.1574	-235.1950	-35.8213	-36.3258
0.3244	2.92	7000	5886.0698	-0.0051	-0.7543	0.8125	0.7492	-271.2288	-235.3164	-35.8752	-36.3770