simpo-lora-2

This model is a fine-tuned version of hatakeyama-llm-team/with_halcination_little_codes_ck5200 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7791	0.1600	100	0.9134	-4.7310	-6.5490	0.7037	1.8179	-2.6196	-1.8924	-0.1767	0.0399
0.4822	0.3200	200	0.5821	-5.4149	-11.9630	0.7349	6.5482	-4.7852	-2.1659	0.0142	0.0556
0.5591	0.4800	300	0.5573	-5.5227	-12.0727	0.7417	6.5500	-4.8291	-2.2091	-0.6007	-0.5349
0.6293	0.6400	400	0.5447	-5.8900	-14.0923	0.7378	8.2023	-5.6369	-2.3560	-0.1891	-0.2711
0.5645	0.8000	500	0.5426	-5.7599	-13.0285	0.7368	7.2686	-5.2114	-2.3040	-0.5393	-0.5567
0.4763	0.9600	600	0.5415	-5.7237	-12.8559	0.7368	7.1321	-5.1424	-2.2895	-0.5526	-0.5626