Llama-2-7b-hf-DPO-PartialEval_ET0.1_MT1.2_V.1.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6461	0.3009	68	0.6839	0.0330	0.0082	0.6000	0.0248	-77.3366	-91.3130	-0.2945	-0.2863
0.9723	0.6018	136	0.6779	0.0339	-0.0486	0.7000	0.0824	-77.9042	-91.3046	-0.3345	-0.3256
0.6461	0.9027	204	0.6352	-0.0081	-0.2128	0.8000	0.2047	-79.5466	-91.7240	-0.3939	-0.3854
0.2832	1.2035	272	0.5825	-0.8764	-1.2440	0.7000	0.3676	-89.8586	-100.4076	-0.6262	-0.6198
0.1923	1.5044	340	0.5559	-1.1573	-1.6161	0.7000	0.4587	-93.5792	-103.2166	-0.6844	-0.6797
0.3898	1.8053	408	0.6173	-1.3556	-1.8473	0.7000	0.4918	-95.8919	-105.1990	-0.8939	-0.8905
0.3404	2.1062	476	0.6381	-1.3063	-1.8875	0.7000	0.5812	-96.2932	-104.7061	-0.9068	-0.9042
0.4954	2.4071	544	0.6915	-1.7445	-2.3721	0.7000	0.6276	-101.1399	-109.0883	-1.0304	-1.0288
0.3914	2.7080	612	0.7063	-1.8087	-2.4327	0.7000	0.6240	-101.7455	-109.7301	-1.0472	-1.0455