Hyponatremia_M2_650steps_1e8rate_01beta_DPO

This model is a fine-tuned version of tsavage68/Hyponatremia_M2_150steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6935	0.2667	50	0.6952	0.0009	0.0048	0.4200	-0.0039	-70.8331	-36.7181	-2.2478	-2.2426
0.6916	0.5333	100	0.6886	0.0029	-0.0064	0.6900	0.0093	-70.9455	-36.6984	-2.2467	-2.2415
0.6785	0.8	150	0.6747	0.0099	-0.0276	0.9200	0.0375	-71.1572	-36.6282	-2.2469	-2.2418
0.6638	1.0667	200	0.6645	0.0139	-0.0445	1.0	0.0584	-71.3267	-36.5883	-2.2467	-2.2417
0.6573	1.3333	250	0.6566	0.0169	-0.0578	1.0	0.0746	-71.4591	-36.5584	-2.2463	-2.2414
0.6553	1.6	300	0.6546	0.0186	-0.0602	1.0	0.0789	-71.4839	-36.5409	-2.2464	-2.2414
0.6472	1.8667	350	0.6549	0.0172	-0.0610	0.9900	0.0782	-71.4915	-36.5548	-2.2463	-2.2415
0.658	2.1333	400	0.6556	0.0176	-0.0591	1.0	0.0767	-71.4727	-36.5508	-2.2457	-2.2408
0.6556	2.4	450	0.6556	0.0167	-0.0601	1.0	0.0768	-71.4827	-36.5600	-2.2462	-2.2414
0.6557	2.6667	500	0.6562	0.0171	-0.0584	1.0	0.0755	-71.4658	-36.5562	-2.2463	-2.2414
0.6504	2.9333	550	0.6565	0.0168	-0.0582	1.0	0.0750	-71.4632	-36.5592	-2.2463	-2.2413
0.6513	3.2	600	0.6565	0.0168	-0.0582	1.0	0.0750	-71.4632	-36.5592	-2.2463	-2.2413
0.6543	3.4667	650	0.6565	0.0168	-0.0582	1.0	0.0750	-71.4632	-36.5592	-2.2463	-2.2413