model

This model is a fine-tuned version of EleutherAI/gpt-neo-125M on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6955	0.2992	100	0.6958	-0.0017	-0.0008	0.4701	-0.0008	-478.7900	-494.2336	-18.3637	-18.4824
0.6906	0.5984	200	0.6962	-0.0028	-0.0016	0.4744	-0.0013	-478.7974	-494.2453	-18.3625	-18.4806
0.6985	0.8975	300	0.6959	-0.0222	-0.0214	0.4738	-0.0008	-478.9952	-494.4388	-18.3624	-18.4809
0.6946	1.1967	400	0.6955	0.0015	0.0015	0.4753	0.0000	-478.7664	-494.2018	-18.3628	-18.4811
0.6946	1.4959	500	0.6960	-0.0046	-0.0040	0.4791	-0.0006	-478.8223	-494.2634	-18.3631	-18.4816
0.6952	1.7951	600	0.6951	-0.0047	-0.0057	0.4882	0.0011	-478.8391	-494.2639	-18.3636	-18.4821
0.6947	2.0942	700	0.6955	-0.0053	-0.0056	0.4822	0.0003	-478.8379	-494.2701	-18.3634	-18.4820
0.6995	2.3934	800	0.6948	-0.0060	-0.0076	0.4918	0.0015	-478.8574	-494.2774	-18.3632	-18.4818
0.6932	2.6926	900	0.6952	-0.0080	-0.0087	0.4837	0.0008	-478.8692	-494.2970	-18.3633	-18.4817
0.6964	2.9918	1000	0.6955	-0.0079	-0.0080	0.4813	0.0001	-478.8612	-494.2958	-18.3633	-18.4819