--- license: apache-2.0 library_name: peft tags: - alignment-handbook - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - HuggingFaceH4/ultrafeedback_binarized base_model: mistralai/Mistral-7B-v0.1 model-index: - name: zephyr-7b-dpo-qlora results: [] --- # zephyr-7b-dpo-qlora This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.5036 - Rewards/chosen: -2.0892 - Rewards/rejected: -3.1197 - Rewards/accuracies: 0.7295 - Rewards/margins: 1.0304 - Logps/rejected: -560.7722 - Logps/chosen: -477.4810 - Logits/rejected: 2.3638 - Logits/chosen: 1.7891 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 2 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected | |:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:| | 0.6931 | 0.01 | 100 | -2.2163 | -2.1335 | -268.5095 | -248.7855 | 0.6930 | 0.5135 | 0.0005 | 0.0003 | 0.0002 | | 0.6926 | 0.03 | 200 | -2.2157 | -2.1330 | -268.3331 | -248.7224 | 0.6924 | 0.5885 | 0.0023 | 0.0014 | 0.0008 | | 0.6904 | 0.04 | 300 | -2.2194 | -2.1373 | -267.3080 | -248.1708 | 0.6901 | 0.6475 | 0.0125 | 0.0062 | 0.0064 | | 0.6868 | 0.05 | 400 | -2.2179 | -2.1356 | -264.7627 | -247.1243 | 0.6830 | 0.6610 | 0.0380 | 0.0211 | 0.0168 | | 0.6781 | 0.07 | 500 | -2.1590 | -2.0748 | -266.5388 | -252.3708 | 0.6679 | 0.6785 | 0.0202 | 0.0558 | -0.0356 | | 0.6565 | 0.08 | 600 | -2.0685 | -1.9763 | -278.9226 | -272.4421 | 0.6403 | 0.6805 | -0.1036 | 0.1327 | -0.2364 | | 0.6411 | 0.09 | 700 | -2.0181 | -1.9197 | -283.8720 | -282.3092 | 0.6254 | 0.6820 | -0.1531 | 0.1819 | -0.3350 | | 0.6177 | 0.1 | 800 | -1.9304 | -1.8202 | -307.0186 | -313.1128 | 0.6134 | 0.6765 | -0.3846 | 0.2585 | -0.6431 | | 0.6333 | 0.12 | 900 | -1.9660 | -1.8566 | -308.6199 | -317.1526 | 0.6082 | 0.6740 | -0.4006 | 0.2829 | -0.6835 | | 0.5776 | 0.13 | 1000 | -2.0038 | -1.8956 | -335.0627 | -351.8794 | 0.6066 | 0.6735 | -0.6650 | 0.3657 | -1.0307 | | 0.6093 | 0.14 | 1100 | -2.0022 | -1.9019 | -324.4846 | -341.5230 | 0.6075 | 0.6740 | -0.5592 | 0.3679 | -0.9272 | | 0.5607 | 0.16 | 1200 | -1.9182 | -1.8081 | -352.8372 | -375.3466 | 0.5970 | 0.6800 | -0.8428 | 0.4226 | -1.2654 | | 0.5627 | 0.17 | 1300 | -1.3203 | -1.1519 | -411.9446 | -433.7877 | 0.5935 | 0.6850 | -1.4339 | 0.4160 | -1.8498 | | 0.5853 | 0.18 | 1400 | -1.0520 | -0.8708 | -389.5525 | -417.2325 | 0.5842 | 0.6950 | -1.2099 | 0.4743 | -1.6843 | | 0.5622 | 0.2 | 1500 | -0.6561 | -0.4323 | -419.2693 | -453.9020 | 0.5712 | 0.6990 | -1.5071 | 0.5439 | -2.0510 | | 0.4815 | 0.21 | 1600 | -0.5810 | -0.3415 | -421.0228 | -464.6043 | 0.5663 | 0.7035 | -1.5246 | 0.6333 | -2.1580 | | 0.4698 | 0.22 | 1700 | 0.5697 | -1.8165 | -2.4986 | 0.6990 | 0.6821 | -498.6652 | -450.2103 | 0.5641 | 0.2594 | | 0.5213 | 0.24 | 1800 | 0.5670 | -1.4236 | -2.1011 | 0.7055 | 0.6776 | -458.9214 | -410.9152 | 0.6173 | 0.2952 | | 0.5295 | 0.25 | 1900 | 0.5606 | -1.9797 | -2.6952 | 0.6945 | 0.7155 | -518.3280 | -466.5294 | 0.8941 | 0.5819 | | 0.6074 | 0.26 | 2000 | 0.5525 | -1.1848 | -1.7881 | 0.7165 | 0.6033 | -427.6170 | -387.0396 | 0.3449 | 0.0271 | | 0.568 | 0.27 | 2100 | 0.5388 | -1.5667 | -2.2488 | 0.7220 | 0.6822 | -473.6912 | -425.2263 | 1.3497 | 0.9786 | | 0.5643 | 0.29 | 2200 | 0.5539 | -1.8112 | -2.6184 | 0.7145 | 0.8072 | -510.6461 | -449.6774 | 1.9603 | 1.5565 | | 0.5226 | 0.3 | 2300 | 0.5354 | -1.6020 | -2.3588 | 0.7245 | 0.7568 | -484.6839 | -428.7553 | 1.3673 | 0.9661 | | 0.4144 | 0.31 | 2400 | 0.5338 | -2.0110 | -2.8276 | 0.7245 | 0.8167 | -531.5681 | -469.6557 | 1.6609 | 1.2542 | | 0.5233 | 0.33 | 2500 | 0.5387 | -1.9001 | -2.7290 | 0.7245 | 0.8289 | -521.7109 | -458.5734 | 1.7390 | 1.3093 | | 0.5425 | 0.34 | 2600 | 0.5430 | -2.4619 | -3.3366 | 0.7225 | 0.8747 | -582.4704 | -514.7514 | 2.4431 | 1.9262 | | 0.4719 | 0.35 | 2700 | 0.5309 | -1.9122 | -2.7118 | 0.7285 | 0.7996 | -519.9872 | -459.7816 | 2.0586 | 1.6066 | | 0.5543 | 0.37 | 2800 | 0.5268 | -1.7066 | -2.4623 | 0.7225 | 0.7557 | -495.0332 | -439.2162 | 1.5924 | 1.1721 | | 0.5409 | 0.38 | 2900 | 0.5400 | -2.1879 | -3.1551 | 0.7175 | 0.9673 | -564.3220 | -487.3477 | 2.0890 | 1.6062 | | 0.4956 | 0.39 | 3000 | 0.5285 | -1.8388 | -2.7165 | 0.7285 | 0.8777 | -520.4593 | -452.4431 | 1.6464 | 1.1679 | | 0.4572 | 0.41 | 3100 | 0.5198 | -1.6639 | -2.4269 | 0.7265 | 0.7630 | -491.4958 | -434.9505 | 1.7627 | 1.2994 | | 0.4962 | 0.42 | 3200 | 0.5181 | -1.6914 | -2.5214 | 0.7265 | 0.8300 | -500.9511 | -437.6994 | 1.6452 | 1.1780 | | 0.6098 | 0.43 | 3300 | 0.5188 | -1.6044 | -2.4380 | 0.7310 | 0.8336 | -492.6022 | -428.9995 | 1.5141 | 1.0617 | | 0.5349 | 0.44 | 3400 | 0.5210 | -1.4720 | -2.3090 | 0.7285 | 0.8370 | -479.7061 | -415.7578 | 1.4965 | 1.0371 | | 0.4773 | 0.46 | 3500 | 0.5206 | -1.4425 | -2.2285 | 0.7280 | 0.7861 | -471.6597 | -412.8062 | 1.8090 | 1.3264 | | 0.5312 | 0.47 | 3600 | 0.5196 | -1.8128 | -2.6719 | 0.7320 | 0.8591 | -515.9943 | -449.8387 | 2.5339 | 2.0191 | | 0.5879 | 0.48 | 3700 | 0.5128 | -1.9225 | -2.7975 | 0.7355 | 0.8750 | -528.5556 | -460.8123 | 2.9390 | 2.3934 | | 0.5202 | 0.5 | 3800 | 0.5155 | -1.8291 | -2.7153 | 0.7330 | 0.8863 | -520.3419 | -451.4667 | 2.2728 | 1.7445 | | 0.5116 | 0.51 | 3900 | 0.5188 | -2.0732 | -3.0427 | 0.7285 | 0.9696 | -553.0799 | -475.8752 | 2.2721 | 1.7291 | | 0.5521 | 0.52 | 4000 | 0.5161 | -2.3283 | -3.3054 | 0.7255 | 0.9771 | -579.3469 | -501.3872 | 2.2577 | 1.7449 | | 0.5107 | 0.54 | 4100 | 0.5197 | -1.8192 | -2.7348 | 0.7215 | 0.9156 | -522.2897 | -450.4803 | 1.7678 | 1.2222 | | 0.4773 | 0.55 | 4200 | 0.5163 | -2.1894 | -3.1554 | 0.7265 | 0.9660 | -564.3451 | -487.4992 | 1.8497 | 1.3121 | | 0.4315 | 0.56 | 4300 | 0.5097 | -2.0873 | -3.0416 | 0.7340 | 0.9544 | -552.9705 | -477.2872 | 2.2039 | 1.6783 | | 0.5176 | 0.58 | 4400 | 0.5097 | -2.2486 | -3.2409 | 0.7290 | 0.9924 | -572.8979 | -493.4146 | 2.0782 | 1.5387 | | 0.4487 | 0.59 | 4500 | 0.5132 | -2.0257 | -3.0144 | 0.7245 | 0.9887 | -550.2475 | -471.1282 | 2.0676 | 1.4968 | | 0.478 | 0.6 | 4600 | 0.5082 | -2.0565 | -3.0343 | 0.7270 | 0.9778 | -552.2376 | -474.2084 | 2.1065 | 1.5402 | | 0.5351 | 0.62 | 4700 | 0.5038 | -1.9625 | -2.8993 | 0.7285 | 0.9368 | -538.7390 | -464.8120 | 2.0488 | 1.5017 | | 0.4942 | 0.63 | 4800 | 0.5058 | -2.2570 | -3.2479 | 0.7305 | 0.9909 | -573.5954 | -494.2575 | 2.5210 | 1.9471 | | 0.4918 | 0.64 | 4900 | 0.5129 | -2.4781 | -3.5322 | 0.7350 | 1.0541 | -602.0275 | -516.3653 | 2.8295 | 2.2468 | | 0.4693 | 0.65 | 5000 | 0.5131 | -2.2974 | -3.3589 | 0.7315 | 1.0615 | -584.6987 | -498.2968 | 2.6931 | 2.1137 | | 0.5796 | 0.67 | 5100 | 0.5084 | -2.1485 | -3.1709 | 0.7300 | 1.0224 | -565.8975 | -483.4113 | 2.4925 | 1.9365 | | 0.5137 | 0.68 | 5200 | 0.5012 | -2.0083 | -2.9370 | 0.7365 | 0.9287 | -542.5073 | -469.3903 | 2.0969 | 1.5738 | | 0.4484 | 0.69 | 5300 | 0.5022 | -2.1149 | -3.0765 | 0.7345 | 0.9616 | -556.4618 | -480.0531 | 2.2539 | 1.7154 | | 0.4608 | 0.71 | 5400 | 0.5035 | -2.1639 | -3.1586 | 0.7380 | 0.9947 | -564.6663 | -484.9485 | 2.2224 | 1.6704 | | 0.5746 | 0.72 | 5500 | 0.5045 | -2.3599 | -3.4023 | 0.7320 | 1.0424 | -589.0370 | -504.5520 | 2.2134 | 1.6562 | | 0.5768 | 0.73 | 5600 | 0.5011 | -2.0662 | -3.0430 | 0.7375 | 0.9767 | -553.1031 | -475.1830 | 1.8199 | 1.2667 | | 0.4359 | 0.75 | 5700 | 0.5032 | -2.0933 | -3.1100 | 0.7350 | 1.0166 | -559.8049 | -477.8932 | 1.9073 | 1.3503 | | 0.4812 | 0.76 | 5800 | 0.5056 | -2.2931 | -3.3640 | 0.7320 | 1.0709 | -585.2068 | -497.8671 | 2.1234 | 1.5508 | | 0.5048 | 0.77 | 5900 | 0.5036 | -1.9424 | -2.9286 | 0.7335 | 0.9862 | -541.6672 | -462.8024 | 1.7970 | 1.2367 | | 0.4505 | 0.79 | 6000 | 0.5053 | -1.9881 | -2.9896 | 0.7330 | 1.0015 | -547.7703 | -467.3695 | 1.9582 | 1.3843 | | 0.5197 | 0.8 | 6100 | 0.5071 | -2.0238 | -3.0391 | 0.7315 | 1.0152 | -552.7153 | -470.9445 | 2.0118 | 1.4341 | | 0.6046 | 0.81 | 6200 | 0.5064 | -2.0803 | -3.1116 | 0.7310 | 1.0313 | -559.9708 | -476.5939 | 2.1151 | 1.5328 | | 0.4669 | 0.82 | 6300 | 0.5072 | -2.1010 | -3.1541 | 0.7310 | 1.0531 | -564.2192 | -478.6570 | 2.2264 | 1.6394 | | 0.5631 | 0.84 | 6400 | 0.5055 | -2.0938 | -3.1385 | 0.7305 | 1.0447 | -562.6528 | -477.9385 | 2.3072 | 1.7230 | | 0.433 | 0.85 | 6500 | 0.5044 | -2.0630 | -3.0936 | 0.7290 | 1.0306 | -558.1638 | -474.8586 | 2.2760 | 1.6963 | | 0.4908 | 0.86 | 6600 | 0.5043 | -2.0569 | -3.0863 | 0.7295 | 1.0294 | -557.4365 | -474.2540 | 2.3343 | 1.7557 | | 0.522 | 0.88 | 6700 | 0.5039 | -2.0755 | -3.1060 | 0.7300 | 1.0304 | -559.4037 | -476.1125 | 2.3469 | 1.7706 | | 0.4953 | 0.89 | 6800 | 0.5039 | -2.0918 | -3.1235 | 0.7290 | 1.0317 | -561.1605 | -477.7388 | 2.3881 | 1.8129 | | 0.5683 | 0.9 | 6900 | 0.5036 | -2.0899 | -3.1203 | 0.7300 | 1.0304 | -560.8373 | -477.5472 | 2.3649 | 1.7897 | | 0.5399 | 0.92 | 7000 | 0.5037 | -2.0831 | -3.1119 | 0.7295 | 1.0288 | -560.0004 | -476.8721 | 2.3590 | 1.7832 | | 0.4628 | 0.93 | 7100 | 0.5035 | -2.0882 | -3.1188 | 0.7300 | 1.0307 | -560.6896 | -477.3761 | 2.3659 | 1.7910 | | 0.5273 | 0.94 | 7200 | 0.5036 | -2.0897 | -3.1202 | 0.7295 | 1.0305 | -560.8275 | -477.5317 | 2.3594 | 1.7853 | | 0.4445 | 0.96 | 7300 | 0.5035 | -2.0889 | -3.1197 | 0.7305 | 1.0308 | -560.7729 | -477.4447 | 2.3614 | 1.7871 | | 0.4839 | 0.97 | 7400 | 0.5035 | -2.0894 | -3.1199 | 0.7310 | 1.0304 | -560.7961 | -477.5042 | 2.3646 | 1.7896 | | 0.4425 | 0.98 | 7500 | 0.5036 | -2.0892 | -3.1197 | 0.7295 | 1.0304 | -560.7722 | -477.4810 | 2.3638 | 1.7891 | | 0.5195 | 0.99 | 7600 | 0.5036 | -2.0892 | -3.1197 | 0.7295 | 1.0304 | -560.7722 | -477.4810 | 2.3638 | 1.7891 | ### Framework versions - PEFT 0.7.1 - Transformers 4.36.2 - Pytorch 2.1.2+cu121 - Datasets 2.14.6 - Tokenizers 0.15.0