gemma-7b-lora-distilabel-intel-orca-dpo-pairs

This model is a fine-tuned version of google/gemma-2b on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6333	0.19	250	0.5221	-0.4649	-1.1235	0.8196	0.6586	-285.2397	-247.9492	-29.5102	-27.3832
0.4697	0.39	500	0.4819	-0.5572	-2.0261	0.8394	1.4689	-294.2652	-248.8721	-29.5979	-27.4182
0.4471	0.58	750	0.4814	-0.5104	-2.3183	0.8418	1.8079	-297.1878	-248.4040	-29.6888	-27.5182
0.4477	0.78	1000	0.4744	-0.3874	-2.2429	0.8418	1.8555	-296.4334	-247.1736	-29.7387	-27.5680
0.458	0.97	1250	0.4641	-0.2842	-2.0677	0.8414	1.7835	-294.6812	-246.1420	-29.7875	-27.6122