model_hh_usp1_200

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
8.0	100	1.3976	1.5497	-0.4049	0.6200	1.9546	-116.4568	-107.9588	-0.0493	-0.0483
16.0	200	1.4110	1.4142	-0.5150	0.6500	1.9292	-116.5792	-108.1093	-0.0538	-0.0530
24.0	300	1.4038	1.3748	-0.6170	0.6400	1.9918	-116.6925	-108.1531	-0.0554	-0.0545
32.0	400	1.4365	1.2965	-0.6346	0.6600	1.9311	-116.7121	-108.2401	-0.0555	-0.0551
40.0	500	1.4139	1.2582	-0.6992	0.6400	1.9574	-116.7839	-108.2827	-0.0582	-0.0580
48.0	600	1.4155	1.2234	-0.7385	0.6400	1.9619	-116.8275	-108.3214	-0.0572	-0.0570
56.0	700	1.4050	1.2174	-0.7564	0.6600	1.9738	-116.8474	-108.3280	-0.0586	-0.0582
64.0	800	1.4250	1.1984	-0.7478	0.6500	1.9462	-116.8379	-108.3491	-0.0589	-0.0586
72.0	900	1.4309	1.1891	-0.7289	0.6400	1.9180	-116.8168	-108.3594	-0.0588	-0.0585
80.0	1000	1.4318	1.1781	-0.7550	0.6400	1.9331	-116.8458	-108.3717	-0.0587	-0.0585