gpt-imdb-alpha_0.7-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
1.1713	0.21	500	2.5800	0.3038	-0.0770	0.7188	0.3808	-264.4555	-232.2277	-33.8095	-34.2861
0.887	0.42	1000	21.3065	0.5505	0.1747	0.6917	0.3758	-261.9387	-229.7607	-32.8001	-33.3563
0.798	0.63	1500	61.4252	0.5093	-0.0100	0.7333	0.5193	-263.7849	-230.1718	-30.8724	-31.2678
1.1771	0.83	2000	14.1653	0.6467	0.1330	0.6854	0.5138	-262.3556	-228.7979	-33.4203	-33.7502
0.5587	1.04	2500	528756.25	0.5517	-0.0428	0.7396	0.5944	-264.1129	-229.7487	-32.9646	-33.4291
0.4833	1.25	3000	1178.0547	0.5836	0.0507	0.6958	0.5329	-263.1786	-229.4295	-32.7156	-33.0784
0.6214	1.46	3500	4177.1973	0.2927	-0.3473	0.7562	0.6400	-267.1580	-232.3383	-29.8543	-30.1578
18.5015	1.67	4000	513.4760	0.4129	-0.2026	0.7479	0.6155	-265.7109	-231.1364	-30.7645	-31.1263
0.3705	1.88	4500	135.9144	0.4609	-0.1462	0.75	0.6071	-265.1470	-230.6563	-30.2459	-30.6495
0.4778	2.08	5000	1561.6661	0.2544	-0.4144	0.7792	0.6687	-267.8289	-232.7216	-30.5732	-30.9863
0.3125	2.29	5500	8448.3389	0.2045	-0.4842	0.7937	0.6887	-268.5275	-233.2203	-31.2362	-31.6616
6.2284	2.5	6000	13438.1006	0.1295	-0.5751	0.7937	0.7045	-269.4362	-233.9707	-31.0171	-31.4348
2.1427	2.71	6500	13021.2812	0.1590	-0.5409	0.7958	0.6999	-269.0947	-233.6758	-31.1241	-31.5456
24.2387	2.92	7000	11466.6748	0.1662	-0.5317	0.7937	0.6979	-269.0021	-233.6036	-31.0907	-31.5102