Llama-2-7b-hf-eval_threapist-ORPO-filtered-version-1

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Nll Loss	Log Odds Ratio	Log Odds Chosen
0.7757	0.4037	44	1.0979	-0.2148	-0.2646	0.7000	0.0498	-2.6460	-2.1485	-0.9189	-0.9063	1.0343	-0.5790	0.5551
0.7944	0.8073	88	1.0113	-0.2113	-0.2607	0.7000	0.0494	-2.6073	-2.1130	-0.9711	-0.9585	0.9516	-0.5747	0.5532
0.6896	1.2110	132	0.9476	-0.2082	-0.2579	0.7000	0.0497	-2.5794	-2.0824	-0.9990	-0.9867	0.8912	-0.5690	0.5574
0.8565	1.6147	176	0.9066	-0.2061	-0.2562	0.7000	0.0500	-2.5615	-2.0611	-1.0229	-1.0111	0.8524	-0.5638	0.5622