checking_generation

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6556	0.1	727	0.6514	0.1988	0.1093	0.7772	0.0895	-59.0237	-75.4071	-0.8991	-0.8898
0.5692	0.2	1454	0.5961	0.4045	0.1788	0.8163	0.2258	-58.3291	-73.3496	-0.8767	-0.8668
0.556	0.3	2181	0.5789	0.4668	0.1938	0.8146	0.2729	-58.1782	-72.7267	-0.8724	-0.8620
0.6199	0.4	2908	0.5738	0.4829	0.1970	0.8299	0.2858	-58.1464	-72.5661	-0.8726	-0.8622
0.6932	0.5	3635	0.5719	0.4845	0.1933	0.8214	0.2912	-58.1835	-72.5492	-0.8681	-0.8577
0.5872	0.6	4362	0.5734	0.4822	0.1948	0.8112	0.2874	-58.1684	-72.5727	-0.8705	-0.8601
0.6009	0.7	5089	0.5735	0.4805	0.1936	0.8112	0.2869	-58.1805	-72.5891	-0.8666	-0.8561
0.4821	0.8	5816	0.5727	0.4826	0.1940	0.8231	0.2886	-58.1766	-72.5685	-0.8683	-0.8578
0.5741	0.9	6543	0.5714	0.4829	0.1913	0.8231	0.2917	-58.2040	-72.5652	-0.8693	-0.8589