results

This model is a fine-tuned version of microsoft/phi-1_5 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0417	0.07	100	0.0418	-0.3892	-8.0118	0.9792	7.6226	-113.9640	-71.2264	1.8258	1.7898
0.0221	0.15	200	0.0303	-2.5657	-10.9212	0.9896	8.3555	-143.0585	-92.9920	1.9704	2.1047
0.0107	0.22	300	0.0131	-1.7388	-11.6047	0.9965	9.8659	-149.8935	-84.7232	1.0731	0.9750
0.0204	0.29	400	0.0108	-2.0131	-11.9647	0.9965	9.9516	-153.4932	-87.4658	1.3610	1.6740
0.0067	0.36	500	0.0080	-5.9488	-19.6561	0.9974	13.7073	-230.4076	-126.8228	-0.4464	-0.2114
0.0	0.44	600	0.0047	-5.6456	-20.2381	0.9983	14.5924	-236.2268	-123.7909	-0.4142	-0.0244
0.0003	0.51	700	0.0018	-7.2250	-21.3351	0.9991	14.1101	-247.1974	-139.5853	-0.3510	-0.0203
0.0005	0.58	800	0.0008	-7.2263	-21.2475	0.9991	14.0211	-246.3209	-139.5981	-0.8673	-0.7010
0.0	0.66	900	0.0009	-10.2371	-26.0402	0.9991	15.8031	-294.2486	-169.7062	-1.9784	-1.7799
0.0	0.73	1000	0.0008	-5.9544	-22.0767	0.9991	16.1223	-254.6137	-126.8789	-1.0623	-0.6039
0.0	0.8	1100	0.0007	-7.3374	-23.8700	0.9991	16.5327	-272.5467	-140.7083	-1.5517	-1.1710
0.0	0.87	1200	0.0007	-7.6398	-24.1605	0.9991	16.5207	-275.4509	-143.7327	-1.8124	-1.4901
0.0	0.95	1300	0.0001	-7.5920	-24.0476	1.0	16.4556	-274.3220	-143.2550	-1.8115	-1.4816
0.0001	1.02	1400	0.0001	-7.5872	-24.0480	1.0	16.4608	-274.3262	-143.2065	-1.8102	-1.4791
0.0	1.09	1500	0.0001	-7.5874	-24.0497	1.0	16.4623	-274.3435	-143.2090	-1.8100	-1.4786