results

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5635	0.24	100	0.5460	0.2168	-0.4663	0.7367	0.6831	-117.7869	-92.0844	-1.3150	-1.2411
0.3836	0.47	200	0.3582	0.1507	-1.4599	0.8494	1.6106	-127.7231	-92.7453	-0.6842	-0.5917
0.2525	0.71	300	0.2509	0.6325	-1.7217	0.9095	2.3542	-130.3404	-87.9269	-0.7855	-0.6860
0.1625	0.94	400	0.1711	0.6613	-2.8054	0.9357	3.4667	-141.1781	-87.6390	-0.7853	-0.6836
0.0695	1.18	500	0.1215	0.6443	-3.7903	0.9589	4.4347	-151.0267	-87.8085	-0.8915	-0.7635
0.0448	1.42	600	0.0905	1.0284	-4.1415	0.9698	5.1699	-154.5387	-83.9677	-0.9632	-0.8182
0.0515	1.65	700	0.0760	1.1233	-3.6423	0.9758	4.7656	-149.5469	-83.0189	-0.9748	-0.8504
0.0396	1.89	800	0.0542	0.7363	-4.9101	0.9864	5.6464	-162.2247	-86.8886	-1.0377	-0.8963
0.0099	2.13	900	0.0486	0.8344	-4.9605	0.9864	5.7949	-162.7287	-85.9078	-1.0199	-0.8760
0.0107	2.36	1000	0.0483	0.8443	-4.9894	0.9864	5.8337	-163.0178	-85.8088	-1.0144	-0.8703