zephyr-dpo-qlora-uf-5e-6

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.4890
Rewards/chosen: -2.8977
Rewards/rejected: -4.0719
Rewards/accuracies: 0.7798
Rewards/margins: 1.1742
Rewards/margins Max: 3.6864
Rewards/margins Min: -0.9274
Rewards/margins Std: 1.5325
Logps/rejected: -669.3330
Logps/chosen: -574.2586
Logits/rejected: -1.7368
Logits/chosen: -1.7961

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 16
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6893	0.03	100	0.6897	0.0026	-0.0055	0.7202	0.0082	0.0362	-0.0170	0.0176	-262.6957	-284.2244	-2.7822	-2.8200
0.6681	0.05	200	0.6689	0.0162	-0.0429	0.7222	0.0591	0.2404	-0.1128	0.1163	-266.4325	-282.8687	-2.7520	-2.7906
0.64	0.08	300	0.6293	-0.3380	-0.5276	0.7044	0.1896	0.7935	-0.3661	0.3880	-314.9071	-318.2889	-2.7294	-2.7644
0.6335	0.1	400	0.6076	-0.3780	-0.6803	0.7143	0.3023	1.2436	-0.5587	0.5973	-330.1778	-322.2904	-2.7035	-2.7413
0.5664	0.13	500	0.5693	-1.0517	-1.6202	0.7222	0.5685	2.1499	-0.8056	0.9738	-424.1662	-389.6617	-2.3570	-2.3930
0.5428	0.16	600	0.5504	-1.1351	-1.8251	0.7460	0.6900	2.5221	-0.8419	1.1085	-444.6526	-397.9947	-2.3087	-2.3340
0.5696	0.18	700	0.5407	-1.6072	-2.2945	0.7302	0.6873	2.3968	-0.8008	1.0591	-491.5914	-445.2077	-2.0233	-2.0544
0.4864	0.21	800	0.5377	-1.4823	-2.3816	0.7381	0.8993	2.9869	-0.9704	1.3291	-500.2979	-432.7151	-2.1126	-2.1435
0.542	0.24	900	0.5399	-1.9887	-2.8948	0.7302	0.9061	3.1667	-0.9490	1.3690	-551.6262	-483.3614	-2.1744	-2.2024
0.5518	0.26	1000	0.5300	-1.9427	-2.8559	0.7540	0.9131	3.1137	-0.9029	1.3265	-547.7310	-478.7619	-2.1380	-2.1708
0.5538	0.29	1100	0.5361	-1.1129	-1.9809	0.7520	0.8681	3.0506	-0.8555	1.2919	-460.2347	-395.7733	-2.1859	-2.2234
0.5482	0.31	1200	0.5345	-1.2650	-2.1623	0.7798	0.8973	3.0598	-0.8739	1.2932	-478.3762	-410.9884	-2.0283	-2.0696
0.5325	0.34	1300	0.5237	-1.3489	-2.2549	0.7540	0.9060	2.9285	-0.9000	1.2688	-487.6328	-419.3813	-2.0319	-2.0646
0.5647	0.37	1400	0.5171	-1.8056	-2.7729	0.7738	0.9673	3.0310	-0.9191	1.3055	-539.4321	-465.0507	-2.0499	-2.0808
0.5458	0.39	1500	0.5139	-1.4005	-2.3080	0.7659	0.9074	2.8815	-0.9358	1.2687	-492.9399	-424.5414	-2.1490	-2.1788
0.4935	0.42	1600	0.5159	-1.4135	-2.4191	0.7619	1.0056	3.1947	-0.8547	1.3594	-504.0516	-425.8337	-2.0721	-2.1058
0.4832	0.44	1700	0.5182	-1.5594	-2.6076	0.7579	1.0482	3.3861	-0.8998	1.4429	-522.9042	-440.4306	-2.1434	-2.1797
0.5158	0.47	1800	0.5181	-1.7427	-2.8825	0.7639	1.1398	3.5508	-0.9741	1.5177	-550.3890	-458.7530	-1.9600	-2.0015
0.451	0.5	1900	0.5090	-1.5156	-2.5725	0.7579	1.0569	3.3790	-0.8482	1.4174	-519.3948	-436.0498	-1.8888	-1.9342
0.4879	0.52	2000	0.5003	-1.8435	-2.8625	0.7718	1.0190	3.2173	-0.9040	1.3683	-548.3914	-468.8387	-1.8468	-1.8969
0.4879	0.55	2100	0.5044	-1.6709	-2.7719	0.7579	1.1010	3.5672	-0.8763	1.4852	-539.3310	-451.5732	-1.9027	-1.9476
0.4949	0.58	2200	0.4964	-3.2082	-4.4391	0.7778	1.2309	3.8910	-1.0365	1.6390	-706.0513	-605.3098	-1.7221	-1.7794
0.5796	0.6	2300	0.4990	-2.6972	-3.7097	0.7897	1.0125	3.2200	-0.8781	1.3552	-633.1115	-554.2051	-1.7896	-1.8422
0.5492	0.63	2400	0.4969	-3.4670	-4.5017	0.7778	1.0347	3.3130	-0.9050	1.3962	-712.3122	-631.1838	-1.6170	-1.6768
0.4667	0.65	2500	0.5004	-3.5869	-4.8937	0.7817	1.3068	4.1402	-1.0666	1.7418	-751.5126	-643.1785	-1.5865	-1.6490
0.5777	0.68	2600	0.4974	-2.4014	-3.5339	0.7619	1.1325	3.5063	-0.9035	1.4860	-615.5330	-524.6262	-1.7399	-1.7949
0.5021	0.71	2700	0.4927	-2.6594	-3.8176	0.7798	1.1583	3.6119	-0.9273	1.5118	-643.9045	-550.4240	-1.7427	-1.7988
0.5332	0.73	2800	0.4905	-3.2417	-4.4343	0.7817	1.1926	3.7159	-0.9639	1.5556	-705.5735	-608.6549	-1.6555	-1.7144
0.5514	0.76	2900	0.4934	-3.7499	-5.0405	0.7798	1.2906	3.9723	-1.0907	1.6887	-766.1927	-659.4749	-1.6687	-1.7302
0.4162	0.79	3000	0.4917	-3.2815	-4.4510	0.7698	1.1694	3.6486	-0.9447	1.5323	-707.2395	-612.6413	-1.6605	-1.7208
0.5252	0.81	3100	0.4897	-3.1223	-4.3214	0.7857	1.1991	3.7431	-0.9577	1.5632	-694.2787	-596.7130	-1.6937	-1.7536
0.4626	0.84	3200	0.4892	-3.0544	-4.1957	0.7798	1.1413	3.5819	-0.9046	1.4895	-681.7123	-589.9283	-1.7159	-1.7744
0.5186	0.86	3300	0.4896	-2.9688	-4.1127	0.7738	1.1440	3.5867	-0.9061	1.4963	-673.4175	-581.3629	-1.7207	-1.7796
0.4699	0.89	3400	0.4892	-2.8679	-4.0085	0.7758	1.1406	3.5840	-0.8920	1.4895	-662.9918	-571.2766	-1.7332	-1.7916
0.4332	0.92	3500	0.4890	-2.8539	-4.0222	0.7817	1.1684	3.6683	-0.9166	1.5238	-664.3640	-569.8725	-1.7403	-1.7991
0.5292	0.94	3600	0.4888	-2.9244	-4.1012	0.7758	1.1768	3.6946	-0.9285	1.5356	-672.2607	-576.9283	-1.7327	-1.7920
0.5462	0.97	3700	0.4889	-2.8929	-4.0659	0.7758	1.1730	3.6816	-0.9250	1.5309	-668.7320	-573.7759	-1.7393	-1.7981
0.4859	0.99	3800	0.4889	-2.8993	-4.0739	0.7778	1.1746	3.6856	-0.9285	1.5334	-669.5308	-574.4193	-1.7408	-1.7997

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

zephyr-dpo-qlora-uf-5e-6

zephyr-dpo-qlora-uf-5e-6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for just1nseo/zephyr-dpo-qlora-uf-5e-6

Dataset used to train just1nseo/zephyr-dpo-qlora-uf-5e-6

Evaluation results