zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.4877
Rewards/chosen: -2.1504
Rewards/rejected: -3.2930
Rewards/accuracies: 0.7485
Rewards/margins: 1.1426
Logps/rejected: -593.1238
Logps/chosen: -500.2867
Logits/rejected: -1.4918
Logits/chosen: -1.5786

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6887	0.0262	100	0.6878	0.0320	0.0209	0.6165	0.0111	-261.7386	-282.0497	-2.2161	-2.2860
0.6673	0.0523	200	0.6705	0.0307	-0.0223	0.6210	0.0530	-266.0498	-282.1794	-2.2309	-2.2977
0.622	0.0785	300	0.6308	-0.4454	-0.6416	0.6530	0.1962	-327.9841	-329.7844	-2.2079	-2.2648
0.6231	0.1047	400	0.6110	-1.1130	-1.4436	0.6600	0.3306	-408.1872	-396.5498	-2.1373	-2.1896
0.5801	0.1309	500	0.5821	-1.0977	-1.5765	0.6770	0.4787	-421.4711	-395.0235	-1.9857	-2.0520
0.5774	0.1570	600	0.5737	-0.8337	-1.3533	0.6960	0.5197	-399.1586	-368.6156	-2.0070	-2.0804
0.5622	0.1832	700	0.5650	-1.6075	-2.3332	0.7010	0.7257	-497.1474	-445.9985	-1.7875	-1.8697
0.519	0.2094	800	0.5425	-1.1058	-1.7696	0.7155	0.6638	-440.7842	-395.8254	-1.7912	-1.8752
0.4857	0.2355	900	0.5474	-1.6987	-2.4665	0.7225	0.7678	-510.4745	-455.1209	-1.6205	-1.6997
0.5378	0.2617	1000	0.5421	-1.2297	-2.0123	0.7090	0.7826	-465.0541	-408.2222	-1.5946	-1.6771
0.5569	0.2879	1100	0.5356	-1.1147	-1.7889	0.7175	0.6742	-442.7119	-396.7189	-1.6536	-1.7402
0.5875	0.3141	1200	0.5264	-1.4433	-2.1309	0.7355	0.6876	-476.9160	-429.5823	-1.5100	-1.6017
0.5681	0.3402	1300	0.5347	-2.5579	-3.4361	0.7165	0.8782	-607.4370	-541.0386	-1.4877	-1.5713
0.5395	0.3664	1400	0.5213	-1.9355	-2.8808	0.7300	0.9452	-551.8996	-478.8040	-1.3998	-1.4881
0.4408	0.3926	1500	0.5228	-2.2961	-3.4521	0.7355	1.1560	-609.0350	-514.8552	-1.5441	-1.6317
0.5416	0.4187	1600	0.5173	-2.2653	-3.2986	0.7285	1.0333	-593.6861	-511.7793	-1.4138	-1.5014
0.5261	0.4449	1700	0.5051	-2.4008	-3.4047	0.7385	1.0038	-604.2916	-525.3339	-1.5638	-1.6434
0.4685	0.4711	1800	0.5065	-1.7470	-2.7320	0.7380	0.9850	-537.0220	-459.9487	-1.5145	-1.6005
0.4293	0.4973	1900	0.5047	-2.6133	-3.7102	0.7390	1.0968	-634.8395	-546.5821	-1.3755	-1.4651
0.4753	0.5234	2000	0.5000	-2.5931	-3.6748	0.7455	1.0817	-631.2996	-544.5588	-1.3866	-1.4735
0.498	0.5496	2100	0.4965	-1.8299	-2.8777	0.7465	1.0478	-551.5919	-468.2369	-1.4616	-1.5507
0.506	0.5758	2200	0.4934	-1.8271	-2.7912	0.7455	0.9641	-542.9438	-467.9619	-1.4831	-1.5724
0.4813	0.6019	2300	0.4948	-2.4682	-3.6441	0.7485	1.1759	-628.2384	-532.0719	-1.4335	-1.5210
0.4851	0.6281	2400	0.4903	-2.1415	-3.2549	0.7450	1.1134	-589.3144	-499.4011	-1.4529	-1.5388
0.5116	0.6543	2500	0.4890	-1.7892	-2.9367	0.7445	1.1475	-557.4963	-464.1678	-1.5214	-1.6087
0.4451	0.6805	2600	0.4929	-2.1993	-3.4514	0.7505	1.2521	-608.9644	-505.1790	-1.4632	-1.5511
0.5207	0.7066	2700	0.4900	-2.1993	-3.3656	0.7490	1.1663	-600.3847	-505.1818	-1.4903	-1.5765
0.4458	0.7328	2800	0.4899	-2.1260	-3.2789	0.7475	1.1529	-591.7167	-497.8499	-1.5008	-1.5876
0.5134	0.7590	2900	0.4878	-2.1729	-3.2932	0.7475	1.1204	-593.1492	-502.5367	-1.4986	-1.5853
0.4722	0.7851	3000	0.4881	-2.1656	-3.2446	0.7505	1.0791	-588.2886	-501.8063	-1.5024	-1.5888
0.4805	0.8113	3100	0.4881	-2.1831	-3.3081	0.7490	1.1250	-594.6381	-503.5581	-1.4902	-1.5774
0.4891	0.8375	3200	0.4879	-2.1565	-3.2929	0.7490	1.1363	-593.1110	-500.9025	-1.4972	-1.5837
0.5083	0.8636	3300	0.4877	-2.1423	-3.2770	0.7490	1.1347	-591.5213	-499.4756	-1.4993	-1.5855
0.446	0.8898	3400	0.4876	-2.1602	-3.3022	0.7480	1.1420	-594.0439	-501.2723	-1.4916	-1.5785
0.5346	0.9160	3500	0.4877	-2.1484	-3.2901	0.7480	1.1418	-592.8391	-500.0872	-1.4929	-1.5797
0.4646	0.9422	3600	0.4876	-2.1484	-3.2908	0.7490	1.1425	-592.9084	-500.0869	-1.4908	-1.5778
0.4696	0.9683	3700	0.4876	-2.1494	-3.2919	0.7490	1.1426	-593.0177	-500.1866	-1.4908	-1.5778
0.5038	0.9945	3800	0.4875	-2.1504	-3.2931	0.7485	1.1428	-593.1368	-500.2856	-1.4918	-1.5786

Framework versions

PEFT 0.7.1
Transformers 4.40.0
Pytorch 2.1.2+cu121
Datasets 2.19.0
Tokenizers 0.19.1

SF-Foundation
/

zephyr-7b-dpo-qlora

zephyr-7b-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Dataset used to train SF-Foundation/zephyr-7b-dpo-qlora

Evaluation results

zephyr-7b-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for mistralai/Mistral-7B-v0.1

Dataset used to train SF-Foundation/zephyr-7b-dpo-qlora

Evaluation results

Adapter for