--- license: apache-2.0 library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - HuggingFaceH4/ultrafeedback_binarized base_model: mistralai/Mistral-7B-v0.1 model-index: - name: zephyr-7b-dpo-qlora results: [] --- # zephyr-7b-dpo-qlora This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.4888 - Rewards/chosen: -3.3026 - Rewards/rejected: -4.6171 - Rewards/accuracies: 0.7510 - Rewards/margins: 1.3145 - Logps/rejected: -706.2916 - Logps/chosen: -594.8843 - Logits/rejected: 1.7556 - Logits/chosen: 1.0124 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6885 | 0.01 | 100 | 0.6887 | 0.0401 | 0.0310 | 0.6155 | 0.0091 | -241.4763 | -260.6096 | -2.3013 | -2.3864 | | 0.6826 | 0.03 | 200 | 0.6777 | 0.0538 | 0.0208 | 0.6555 | 0.0329 | -242.4942 | -259.2415 | -2.2939 | -2.3792 | | 0.6623 | 0.04 | 300 | 0.6578 | -0.0931 | -0.1758 | 0.6735 | 0.0827 | -262.1588 | -273.9337 | -2.2310 | -2.3202 | | 0.6619 | 0.05 | 400 | 0.6455 | -0.2994 | -0.4240 | 0.6610 | 0.1245 | -286.9754 | -294.5644 | -2.0309 | -2.1441 | | 0.6257 | 0.07 | 500 | 0.6194 | -0.3522 | -0.5612 | 0.6850 | 0.2089 | -300.6967 | -299.8442 | -2.0400 | -2.1485 | | 0.6114 | 0.08 | 600 | 0.6004 | -0.6308 | -0.9602 | 0.6755 | 0.3295 | -340.6012 | -327.6964 | -1.5503 | -1.7200 | | 0.5394 | 0.09 | 700 | 0.6103 | -1.5690 | -1.9843 | 0.6635 | 0.4153 | -443.0096 | -421.5208 | -0.6532 | -0.9309 | | 0.6171 | 0.1 | 800 | 0.6372 | -1.7546 | -2.0641 | 0.6405 | 0.3095 | -450.9858 | -440.0762 | 0.0235 | -0.3349 | | 0.5553 | 0.12 | 900 | 0.5687 | -1.3500 | -1.8540 | 0.6930 | 0.5041 | -429.9809 | -399.6168 | 2.6187 | 1.9978 | | 0.6299 | 0.13 | 1000 | 0.5620 | -1.1629 | -1.7464 | 0.6975 | 0.5835 | -419.2182 | -380.9113 | 3.4192 | 2.7155 | | 0.5898 | 0.14 | 1100 | 0.5619 | -2.4368 | -3.0963 | 0.7090 | 0.6594 | -554.2042 | -508.3033 | 5.3078 | 4.4134 | | 0.4782 | 0.16 | 1200 | 0.5594 | -1.5060 | -2.2383 | 0.7090 | 0.7323 | -468.4132 | -415.2229 | 4.0187 | 3.1485 | | 0.5709 | 0.17 | 1300 | 0.5481 | -1.7316 | -2.3668 | 0.7245 | 0.6352 | -481.2582 | -437.7783 | 4.1315 | 3.2570 | | 0.5181 | 0.18 | 1400 | 0.5454 | -2.4857 | -3.3898 | 0.7140 | 0.9042 | -583.5640 | -513.1900 | 4.6977 | 3.6944 | | 0.5495 | 0.2 | 1500 | 0.5428 | -2.5602 | -3.3574 | 0.7205 | 0.7972 | -580.3215 | -520.6432 | 4.1847 | 3.2888 | | 0.574 | 0.21 | 1600 | 0.5638 | -2.7101 | -3.5446 | 0.7190 | 0.8346 | -599.0428 | -535.6277 | 4.9219 | 3.9304 | | 0.4901 | 0.22 | 1700 | 0.5284 | -2.4900 | -3.3577 | 0.7335 | 0.8677 | -580.3493 | -513.6201 | 3.8220 | 2.9305 | | 0.5149 | 0.24 | 1800 | 0.5408 | -1.7507 | -2.4663 | 0.7215 | 0.7156 | -491.2047 | -439.6899 | 2.0262 | 1.2751 | | 0.6382 | 0.25 | 1900 | 0.5325 | -2.1268 | -2.9548 | 0.7255 | 0.8279 | -540.0542 | -477.3052 | 2.4039 | 1.4990 | | 0.5178 | 0.26 | 2000 | 0.5276 | -1.4221 | -2.1526 | 0.7305 | 0.7305 | -459.8390 | -406.8324 | 1.5288 | 0.8157 | | 0.524 | 0.27 | 2100 | 0.5663 | -2.7101 | -3.7077 | 0.7110 | 0.9976 | -615.3445 | -535.6266 | 2.5955 | 1.6625 | | 0.523 | 0.29 | 2200 | 0.5422 | -2.2871 | -3.3438 | 0.7230 | 1.0567 | -578.9616 | -493.3343 | 3.5955 | 2.5436 | | 0.5431 | 0.3 | 2300 | 0.5253 | -2.1932 | -3.2183 | 0.7340 | 1.0252 | -566.4124 | -483.9387 | 4.2433 | 3.2004 | | 0.5147 | 0.31 | 2400 | 0.5132 | -2.8441 | -3.8795 | 0.7315 | 1.0354 | -632.5286 | -549.0342 | 4.6772 | 3.6861 | | 0.4198 | 0.33 | 2500 | 0.5214 | -2.1756 | -3.1443 | 0.7290 | 0.9687 | -559.0054 | -482.1783 | 2.7950 | 1.8511 | | 0.5994 | 0.34 | 2600 | 0.5188 | -3.1314 | -4.1849 | 0.7290 | 1.0535 | -663.0683 | -577.7604 | 3.4511 | 2.4450 | | 0.4812 | 0.35 | 2700 | 0.5139 | -3.0136 | -4.1060 | 0.7455 | 1.0924 | -655.1821 | -565.9851 | 3.7760 | 2.7916 | | 0.4696 | 0.37 | 2800 | 0.5137 | -2.2305 | -3.2368 | 0.7355 | 1.0063 | -568.2574 | -487.6709 | 2.6757 | 1.8289 | | 0.5418 | 0.38 | 2900 | 0.5177 | -2.0641 | -3.1462 | 0.7345 | 1.0822 | -559.2020 | -471.0270 | 2.0189 | 1.1899 | | 0.5068 | 0.39 | 3000 | 0.5096 | -2.4564 | -3.5648 | 0.7400 | 1.1084 | -601.0543 | -510.2569 | 2.8679 | 2.0023 | | 0.4429 | 0.41 | 3100 | 0.5324 | -2.7544 | -3.8869 | 0.7180 | 1.1325 | -633.2682 | -540.0566 | 1.3309 | 0.6491 | | 0.5977 | 0.42 | 3200 | 0.4963 | -2.8842 | -3.9825 | 0.7425 | 1.0983 | -642.8285 | -553.0416 | 2.0170 | 1.2328 | | 0.5281 | 0.43 | 3300 | 0.5074 | -2.4254 | -3.5511 | 0.7325 | 1.1257 | -599.6907 | -507.1647 | 1.1826 | 0.4294 | | 0.5114 | 0.44 | 3400 | 0.5197 | -2.8424 | -4.0833 | 0.7255 | 1.2409 | -652.9095 | -548.8630 | 2.1493 | 1.2128 | | 0.4984 | 0.46 | 3500 | 0.5002 | -3.1997 | -4.4222 | 0.7450 | 1.2225 | -686.7951 | -584.5864 | 3.3502 | 2.4203 | | 0.5723 | 0.47 | 3600 | 0.5010 | -3.0065 | -4.2439 | 0.7410 | 1.2374 | -668.9721 | -565.2749 | 3.1534 | 2.2598 | | 0.5496 | 0.48 | 3700 | 0.5015 | -3.0581 | -4.3336 | 0.7395 | 1.2755 | -677.9391 | -570.4304 | 3.3120 | 2.4472 | | 0.5106 | 0.5 | 3800 | 0.5013 | -3.5077 | -4.8209 | 0.7395 | 1.3132 | -726.6729 | -615.3915 | 2.7134 | 1.8547 | | 0.376 | 0.51 | 3900 | 0.4995 | -3.2636 | -4.5260 | 0.7415 | 1.2624 | -697.1753 | -590.9803 | 2.7739 | 1.9628 | | 0.4935 | 0.52 | 4000 | 0.4916 | -2.8251 | -3.9628 | 0.7465 | 1.1377 | -640.8605 | -547.1311 | 2.2899 | 1.5516 | | 0.445 | 0.54 | 4100 | 0.4959 | -3.1300 | -4.4063 | 0.7480 | 1.2763 | -685.2046 | -577.6177 | 2.5949 | 1.8263 | | 0.443 | 0.55 | 4200 | 0.5039 | -2.6104 | -3.9167 | 0.7345 | 1.3063 | -636.2510 | -525.6652 | 2.5643 | 1.7637 | | 0.517 | 0.56 | 4300 | 0.5042 | -3.0608 | -4.4485 | 0.7375 | 1.3877 | -689.4330 | -570.7054 | 2.6212 | 1.8545 | | 0.3693 | 0.58 | 4400 | 0.4969 | -3.2698 | -4.5598 | 0.7470 | 1.2900 | -700.5564 | -591.6002 | 2.5178 | 1.8051 | | 0.481 | 0.59 | 4500 | 0.4893 | -2.8076 | -3.9614 | 0.7445 | 1.1537 | -640.7148 | -545.3853 | 2.0329 | 1.3648 | | 0.4696 | 0.6 | 4600 | 0.4945 | -3.3369 | -4.5983 | 0.7465 | 1.2614 | -704.4065 | -598.3125 | 2.6733 | 1.9401 | | 0.4437 | 0.62 | 4700 | 0.4940 | -2.8130 | -4.0860 | 0.7445 | 1.2730 | -653.1788 | -545.9229 | 2.0547 | 1.2696 | | 0.4492 | 0.63 | 4800 | 0.4963 | -2.7727 | -4.0657 | 0.7465 | 1.2930 | -651.1524 | -541.8960 | 2.3393 | 1.5355 | | 0.5163 | 0.64 | 4900 | 0.5017 | -3.3498 | -4.7649 | 0.7465 | 1.4150 | -721.0643 | -599.6019 | 2.0201 | 1.2216 | | 0.488 | 0.65 | 5000 | 0.4917 | -3.2508 | -4.5623 | 0.7480 | 1.3115 | -700.8107 | -589.7007 | 1.9166 | 1.1418 | | 0.3606 | 0.67 | 5100 | 0.4905 | -2.9757 | -4.2308 | 0.7460 | 1.2551 | -667.6595 | -562.1877 | 1.5031 | 0.7813 | | 0.58 | 0.68 | 5200 | 0.4897 | -2.8783 | -4.1021 | 0.75 | 1.2239 | -654.7924 | -552.4492 | 1.2839 | 0.5850 | | 0.5788 | 0.69 | 5300 | 0.4900 | -3.0607 | -4.2816 | 0.7490 | 1.2209 | -672.7391 | -570.6943 | 1.4059 | 0.7114 | | 0.4138 | 0.71 | 5400 | 0.4910 | -3.3493 | -4.6193 | 0.7515 | 1.2701 | -706.5120 | -599.5464 | 1.6121 | 0.8970 | | 0.5737 | 0.72 | 5500 | 0.4898 | -3.1843 | -4.4515 | 0.7480 | 1.2672 | -689.7249 | -583.0511 | 1.4061 | 0.6955 | | 0.4249 | 0.73 | 5600 | 0.4918 | -3.3448 | -4.6778 | 0.7490 | 1.3330 | -712.3564 | -599.0980 | 1.7110 | 0.9558 | | 0.5457 | 0.75 | 5700 | 0.4897 | -3.2784 | -4.5741 | 0.75 | 1.2957 | -701.9877 | -592.4562 | 1.7372 | 0.9922 | | 0.5287 | 0.76 | 5800 | 0.4920 | -3.3167 | -4.6600 | 0.7495 | 1.3433 | -710.5778 | -596.2890 | 1.9802 | 1.2037 | | 0.5286 | 0.77 | 5900 | 0.4919 | -3.2305 | -4.5655 | 0.7465 | 1.3350 | -701.1276 | -587.6722 | 1.9038 | 1.1361 | | 0.5147 | 0.79 | 6000 | 0.4910 | -3.3145 | -4.6435 | 0.7505 | 1.3290 | -708.9319 | -596.0760 | 1.9303 | 1.1726 | | 0.4478 | 0.8 | 6100 | 0.4886 | -3.2069 | -4.5013 | 0.7480 | 1.2944 | -694.7131 | -585.3105 | 1.7621 | 1.0186 | | 0.5236 | 0.81 | 6200 | 0.4901 | -3.3207 | -4.6497 | 0.7495 | 1.3290 | -709.5499 | -596.6957 | 1.8309 | 1.0794 | | 0.5079 | 0.82 | 6300 | 0.4890 | -3.3084 | -4.6220 | 0.7495 | 1.3137 | -706.7820 | -595.4583 | 1.7747 | 1.0322 | | 0.4942 | 0.84 | 6400 | 0.4891 | -3.2621 | -4.5672 | 0.7495 | 1.3051 | -701.3010 | -590.8314 | 1.7716 | 1.0268 | | 0.4688 | 0.85 | 6500 | 0.4891 | -3.2863 | -4.5956 | 0.7505 | 1.3093 | -704.1410 | -593.2547 | 1.7863 | 1.0402 | | 0.5062 | 0.86 | 6600 | 0.4889 | -3.2923 | -4.6029 | 0.7485 | 1.3106 | -704.8691 | -593.8478 | 1.7695 | 1.0261 | | 0.574 | 0.88 | 6700 | 0.4887 | -3.2779 | -4.5886 | 0.7495 | 1.3108 | -703.4429 | -592.4089 | 1.7573 | 1.0140 | | 0.5737 | 0.89 | 6800 | 0.4887 | -3.2917 | -4.6042 | 0.7510 | 1.3124 | -704.9940 | -593.7938 | 1.7560 | 1.0126 | | 0.4298 | 0.9 | 6900 | 0.4889 | -3.2985 | -4.6115 | 0.7505 | 1.3131 | -705.7332 | -594.4664 | 1.7563 | 1.0130 | | 0.55 | 0.92 | 7000 | 0.4889 | -3.2997 | -4.6137 | 0.7505 | 1.3140 | -705.9527 | -594.5901 | 1.7567 | 1.0132 | | 0.4123 | 0.93 | 7100 | 0.4889 | -3.3026 | -4.6168 | 0.7515 | 1.3142 | -706.2578 | -594.8819 | 1.7586 | 1.0151 | | 0.5207 | 0.94 | 7200 | 0.4887 | -3.3049 | -4.6192 | 0.75 | 1.3143 | -706.5007 | -595.1128 | 1.7557 | 1.0126 | | 0.4618 | 0.96 | 7300 | 0.4888 | -3.3019 | -4.6165 | 0.7515 | 1.3145 | -706.2247 | -594.8143 | 1.7552 | 1.0116 | | 0.4826 | 0.97 | 7400 | 0.4889 | -3.3035 | -4.6177 | 0.7510 | 1.3142 | -706.3512 | -594.9731 | 1.7538 | 1.0108 | | 0.3856 | 0.98 | 7500 | 0.4887 | -3.3043 | -4.6187 | 0.7515 | 1.3144 | -706.4486 | -595.0473 | 1.7544 | 1.0114 | | 0.5369 | 0.99 | 7600 | 0.4886 | -3.3028 | -4.6175 | 0.7520 | 1.3147 | -706.3290 | -594.9012 | 1.7559 | 1.0126 | ### Framework versions - PEFT 0.8.2 - Transformers 4.38.1 - Pytorch 2.2.0 - Datasets 2.17.1 - Tokenizers 0.15.2