Llama-2-7b-hf-DPO-PartialEval_ET0.1_MT1.2_1-5_V.1.0_Filtered0.1_V3.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7679	0.2993	60	0.6593	0.0108	-0.0521	0.6000	0.0629	-66.5412	-75.7445	-0.4729	-0.4564
0.5441	0.5985	120	0.6291	-0.5862	-0.7160	0.7000	0.1298	-73.1809	-81.7145	-0.4661	-0.4518
0.4072	0.8978	180	0.6867	-0.9478	-1.0597	0.6000	0.1119	-76.6181	-85.3308	-0.4052	-0.3928
0.2754	1.1970	240	0.6937	-1.6591	-1.7694	0.6000	0.1103	-83.7147	-92.4433	-0.6080	-0.5961
0.3506	1.4963	300	0.7085	-1.2433	-1.5083	0.5	0.2650	-81.1036	-88.2852	-0.7011	-0.6903
0.266	1.7955	360	0.8548	-1.8431	-2.1010	0.5	0.2579	-87.0309	-94.2836	-0.9269	-0.9164
0.5629	2.0948	420	0.7761	-2.2110	-2.6203	0.5	0.4093	-92.2235	-97.9622	-1.0014	-0.9908
0.0832	2.3940	480	1.0148	-3.6008	-4.0627	0.5	0.4618	-106.6473	-111.8609	-1.2254	-1.2163
0.1597	2.6933	540	0.9907	-3.5220	-4.0847	0.5	0.5627	-106.8678	-111.0727	-1.2661	-1.2566
0.1285	2.9925	600	0.9702	-3.4296	-3.9944	0.6000	0.5648	-105.9646	-110.1487	-1.2385	-1.2295