llama7b_sigmoid_lr2e-05_b0.1

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.3459	0.1	625	0.2556	-0.6712	-4.2274	0.8794	3.5562	-101.5834	-79.9183	-1.2179	-1.2096
0.2719	0.2	1250	0.2085	0.0287	-4.3321	0.9061	4.3608	-102.6303	-72.9196	-1.1053	-1.0893
0.2001	0.3	1875	0.1929	-1.2718	-6.9070	0.9190	5.6352	-128.3787	-85.9238	-1.1863	-1.1810
0.1957	0.4	2500	0.1669	-1.1545	-7.1547	0.9279	6.0002	-130.8558	-84.7511	-1.1910	-1.1850
0.1518	0.5	3125	0.1557	-2.0458	-8.1764	0.9377	6.1307	-141.0733	-93.6639	-1.1920	-1.1925
0.1211	0.6	3750	0.1462	-1.4428	-7.8814	0.9397	6.4386	-138.1226	-87.6339	-1.1811	-1.1836
0.0392	0.7	4375	0.1432	-2.1517	-8.7282	0.9407	6.5765	-146.5913	-94.7233	-1.1758	-1.1786
0.0814	0.8	5000	0.1400	-2.2256	-8.8415	0.9476	6.6159	-147.7243	-95.4621	-1.1777	-1.1809
0.324	0.9	5625	0.1389	-2.3072	-8.9460	0.9457	6.6388	-148.7690	-96.2786	-1.1836	-1.1870