metadata

license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-lora
    results: []

zephyr-7b-dpo-lora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
-0.2007	1.0	969	-0.0988	-1.1416	-2.4993	0.6746	1.3577	-259.9262	-301.2188	-1.9876	-2.0976
-2.3739	2.0	1938	-3.0140	-12.9185	-17.8885	0.6587	4.9699	-413.8172	-418.9880	-1.4397	-1.5909
-5.7169	3.0	2907	-7.5416	-29.9194	-39.8539	0.6151	9.9345	-633.4722	-588.9970	-1.0751	-1.2630