tulu-2-7b-full-UF-5e-7

This model is a fine-tuned version of allenai/tulu-2-7b on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6816	0.07	100	0.6919	0.0000	-0.0020	0.5417	0.0021	0.0277	-0.0245	0.0175	-336.3843	-345.8331	-1.1956	-1.1695
0.5468	0.15	200	0.6793	-0.1136	-0.1432	0.5794	0.0296	0.2495	-0.1965	0.1511	-350.5013	-357.1989	-1.1509	-1.1466
0.3597	0.22	300	0.6788	-0.9347	-1.0641	0.5714	0.1294	1.0084	-0.7320	0.5779	-442.5906	-439.3020	-1.0512	-1.0629
0.2059	0.29	400	0.7172	-1.9680	-2.3061	0.5972	0.3381	2.3443	-1.3886	1.2205	-566.7862	-542.6320	-0.8695	-0.8807
0.1354	0.37	500	0.8082	-3.1553	-3.7843	0.6190	0.6290	4.0818	-2.2017	2.0321	-714.6080	-661.3674	-0.1617	-0.2554
0.1327	0.44	600	0.8436	-3.8517	-4.6192	0.6190	0.7675	4.8313	-2.4317	2.3526	-798.1056	-731.0093	0.1600	0.0173
0.0777	0.52	700	0.9893	-4.9432	-5.9282	0.6190	0.9850	6.3532	-3.2959	3.1250	-929.0052	-840.1605	0.6301	0.4163
0.0638	0.59	800	0.8086	-3.8655	-4.6357	0.6190	0.7702	4.5021	-2.2919	2.2427	-799.7516	-732.3853	0.2889	0.1244
0.0997	0.66	900	0.8639	-4.4406	-5.3058	0.6270	0.8652	5.1592	-2.6378	2.5658	-866.7603	-789.8954	0.3918	0.2055
0.0708	0.74	1000	0.8618	-4.4546	-5.2895	0.6230	0.8349	5.0604	-2.6224	2.5213	-865.1302	-791.2946	0.4063	0.2199
0.141	0.81	1100	0.9049	-4.8648	-5.7977	0.6190	0.9330	5.6327	-2.8439	2.7856	-915.9548	-832.3105	0.5083	0.3017
0.0775	0.88	1200	0.9049	-4.9040	-5.8585	0.6210	0.9546	5.7130	-2.8316	2.8132	-922.0319	-836.2313	0.5172	0.3074
0.0464	0.96	1300	0.9017	-4.8659	-5.8048	0.6230	0.9389	5.6516	-2.8163	2.7854	-916.6636	-832.4283	0.4957	0.2899