zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4889
Rewards/chosen: -3.4919
Rewards/rejected: -4.6148
Rewards/accuracies: 0.7435
Rewards/margins: 1.1229
Logps/rejected: -710.3250
Logps/chosen: -617.8050
Logits/rejected: 2.4382
Logits/chosen: 1.8324

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6864	0.03	100	0.6863	0.0256	0.0116	0.6655	0.0140	-247.6842	-266.0519	-2.2636	-2.2907
0.6536	0.05	200	0.6562	-0.0020	-0.0870	0.6795	0.0850	-257.5441	-268.8117	-2.1925	-2.2227
0.6091	0.08	300	0.6253	-0.1307	-0.3097	0.6765	0.1790	-279.8125	-281.6791	-2.0835	-2.1274
0.621	0.1	400	0.6015	-0.3933	-0.6797	0.6870	0.2863	-316.8083	-307.9449	-1.5794	-1.6497
0.5642	0.13	500	0.5675	-0.9095	-1.4269	0.7005	0.5173	-391.5297	-359.5655	0.0829	-0.0611
0.5571	0.16	600	0.5609	-0.6613	-1.1813	0.7030	0.5199	-366.9699	-334.7451	0.3065	0.1542
0.5522	0.18	700	0.5529	-1.3200	-2.0684	0.7125	0.7484	-455.6828	-400.6138	0.6566	0.4951
0.5173	0.21	800	0.5424	-1.4995	-2.1557	0.7210	0.6562	-464.4126	-418.5581	0.9340	0.7766
0.5131	0.24	900	0.5350	-1.2707	-1.9441	0.7225	0.6734	-443.2551	-395.6827	0.7198	0.4982
0.5516	0.26	1000	0.5308	-1.4721	-2.2769	0.7275	0.8048	-476.5374	-415.8254	1.5922	1.2895
0.595	0.29	1100	0.5234	-1.6641	-2.3890	0.7275	0.7250	-487.7470	-435.0183	1.5276	1.2773
0.5624	0.31	1200	0.5128	-1.0539	-1.8241	0.7340	0.7702	-431.2521	-373.9983	1.5739	1.2928
0.5463	0.34	1300	0.5081	-1.8181	-2.6160	0.7385	0.7979	-510.4464	-450.4248	1.5928	1.2965
0.5488	0.37	1400	0.5137	-1.3146	-2.1568	0.7310	0.8422	-464.5221	-400.0710	1.7262	1.2885
0.4586	0.39	1500	0.5155	-3.2016	-4.3562	0.7350	1.1546	-684.4664	-588.7742	3.2844	2.7375
0.5471	0.42	1600	0.5012	-2.4217	-3.3641	0.7365	0.9424	-585.2510	-510.7790	2.1009	1.5372
0.5099	0.44	1700	0.5288	-3.3569	-4.4734	0.7235	1.1164	-696.1783	-604.3042	3.4023	3.0072
0.4978	0.47	1800	0.5075	-2.7705	-3.8299	0.7365	1.0594	-631.8281	-545.6577	2.0002	1.4546
0.4677	0.5	1900	0.4988	-2.9036	-3.9903	0.7395	1.0867	-647.8719	-558.9749	2.5698	2.0061
0.4925	0.52	2000	0.5035	-4.3444	-5.3858	0.7460	1.0414	-787.4236	-703.0505	3.6952	3.3091
0.51	0.55	2100	0.4970	-3.6623	-4.7383	0.7455	1.0760	-722.6686	-634.8400	2.4069	1.8447
0.477	0.58	2200	0.4936	-3.3814	-4.3841	0.7410	1.0026	-687.2482	-606.7535	2.1259	1.5963
0.4949	0.6	2300	0.4922	-3.2251	-4.2792	0.7435	1.0541	-676.7632	-591.1223	2.1980	1.6616
0.4703	0.63	2400	0.4927	-3.4550	-4.5502	0.7430	1.0953	-703.8674	-614.1109	2.5717	2.0218
0.5008	0.65	2500	0.4912	-3.1973	-4.2894	0.7470	1.0922	-677.7869	-588.3384	2.4184	1.8485
0.4675	0.68	2600	0.4920	-3.1180	-4.1936	0.7420	1.0756	-668.2031	-580.4097	1.9675	1.3556
0.4925	0.71	2700	0.4923	-3.5135	-4.6518	0.7435	1.1383	-714.0211	-619.9608	2.4291	1.8215
0.4597	0.73	2800	0.4918	-3.6496	-4.8348	0.7440	1.1852	-732.3182	-633.5714	2.6423	2.0210
0.4919	0.76	2900	0.4897	-3.6207	-4.7515	0.7440	1.1308	-723.9899	-630.6806	2.5536	1.9562
0.4635	0.79	3000	0.4893	-3.5211	-4.6272	0.7440	1.1061	-711.5598	-620.7185	2.4752	1.8796
0.4859	0.81	3100	0.4894	-3.5189	-4.6365	0.7450	1.1176	-712.4931	-620.5024	2.4653	1.8672
0.4941	0.84	3200	0.4888	-3.5079	-4.6251	0.7440	1.1173	-711.3568	-619.3996	2.4251	1.8243
0.5292	0.86	3300	0.4889	-3.4834	-4.5980	0.7465	1.1146	-708.6420	-616.9550	2.4156	1.8117
0.4743	0.89	3400	0.4890	-3.4967	-4.6185	0.7445	1.1218	-710.6937	-618.2842	2.4440	1.8387
0.5287	0.92	3500	0.4892	-3.4927	-4.6154	0.7455	1.1227	-710.3807	-617.8776	2.4399	1.8339
0.4628	0.94	3600	0.4891	-3.4925	-4.6150	0.7460	1.1225	-710.3422	-617.8592	2.4386	1.8320
0.4781	0.97	3700	0.4890	-3.4941	-4.6169	0.7455	1.1229	-710.5355	-618.0179	2.4391	1.8328
0.5121	0.99	3800	0.4889	-3.4919	-4.6148	0.7435	1.1229	-710.3250	-617.8050	2.4382	1.8324

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2
Datasets 2.14.6
Tokenizers 0.15.2

jiaqi7
/

zephyr-7b-dpo-qlora

zephyr-7b-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jiaqi7/zephyr-7b-dpo-qlora

Evaluation results