Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5036
  • Rewards/chosen: -2.0892
  • Rewards/rejected: -3.1197
  • Rewards/accuracies: 0.7295
  • Rewards/margins: 1.0304
  • Logps/rejected: -560.7722
  • Logps/chosen: -477.4810
  • Logits/rejected: 2.3638
  • Logits/chosen: 1.7891

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6931 0.01 100 -2.2163 -2.1335 -268.5095 -248.7855 0.6930 0.5135 0.0005 0.0003 0.0002
0.6926 0.03 200 -2.2157 -2.1330 -268.3331 -248.7224 0.6924 0.5885 0.0023 0.0014 0.0008
0.6904 0.04 300 -2.2194 -2.1373 -267.3080 -248.1708 0.6901 0.6475 0.0125 0.0062 0.0064
0.6868 0.05 400 -2.2179 -2.1356 -264.7627 -247.1243 0.6830 0.6610 0.0380 0.0211 0.0168
0.6781 0.07 500 -2.1590 -2.0748 -266.5388 -252.3708 0.6679 0.6785 0.0202 0.0558 -0.0356
0.6565 0.08 600 -2.0685 -1.9763 -278.9226 -272.4421 0.6403 0.6805 -0.1036 0.1327 -0.2364
0.6411 0.09 700 -2.0181 -1.9197 -283.8720 -282.3092 0.6254 0.6820 -0.1531 0.1819 -0.3350
0.6177 0.1 800 -1.9304 -1.8202 -307.0186 -313.1128 0.6134 0.6765 -0.3846 0.2585 -0.6431
0.6333 0.12 900 -1.9660 -1.8566 -308.6199 -317.1526 0.6082 0.6740 -0.4006 0.2829 -0.6835
0.5776 0.13 1000 -2.0038 -1.8956 -335.0627 -351.8794 0.6066 0.6735 -0.6650 0.3657 -1.0307
0.6093 0.14 1100 -2.0022 -1.9019 -324.4846 -341.5230 0.6075 0.6740 -0.5592 0.3679 -0.9272
0.5607 0.16 1200 -1.9182 -1.8081 -352.8372 -375.3466 0.5970 0.6800 -0.8428 0.4226 -1.2654
0.5627 0.17 1300 -1.3203 -1.1519 -411.9446 -433.7877 0.5935 0.6850 -1.4339 0.4160 -1.8498
0.5853 0.18 1400 -1.0520 -0.8708 -389.5525 -417.2325 0.5842 0.6950 -1.2099 0.4743 -1.6843
0.5622 0.2 1500 -0.6561 -0.4323 -419.2693 -453.9020 0.5712 0.6990 -1.5071 0.5439 -2.0510
0.4815 0.21 1600 -0.5810 -0.3415 -421.0228 -464.6043 0.5663 0.7035 -1.5246 0.6333 -2.1580
0.4698 0.22 1700 0.5697 -1.8165 -2.4986 0.6990 0.6821 -498.6652 -450.2103 0.5641 0.2594
0.5213 0.24 1800 0.5670 -1.4236 -2.1011 0.7055 0.6776 -458.9214 -410.9152 0.6173 0.2952
0.5295 0.25 1900 0.5606 -1.9797 -2.6952 0.6945 0.7155 -518.3280 -466.5294 0.8941 0.5819
0.6074 0.26 2000 0.5525 -1.1848 -1.7881 0.7165 0.6033 -427.6170 -387.0396 0.3449 0.0271
0.568 0.27 2100 0.5388 -1.5667 -2.2488 0.7220 0.6822 -473.6912 -425.2263 1.3497 0.9786
0.5643 0.29 2200 0.5539 -1.8112 -2.6184 0.7145 0.8072 -510.6461 -449.6774 1.9603 1.5565
0.5226 0.3 2300 0.5354 -1.6020 -2.3588 0.7245 0.7568 -484.6839 -428.7553 1.3673 0.9661
0.4144 0.31 2400 0.5338 -2.0110 -2.8276 0.7245 0.8167 -531.5681 -469.6557 1.6609 1.2542
0.5233 0.33 2500 0.5387 -1.9001 -2.7290 0.7245 0.8289 -521.7109 -458.5734 1.7390 1.3093
0.5425 0.34 2600 0.5430 -2.4619 -3.3366 0.7225 0.8747 -582.4704 -514.7514 2.4431 1.9262
0.4719 0.35 2700 0.5309 -1.9122 -2.7118 0.7285 0.7996 -519.9872 -459.7816 2.0586 1.6066
0.5543 0.37 2800 0.5268 -1.7066 -2.4623 0.7225 0.7557 -495.0332 -439.2162 1.5924 1.1721
0.5409 0.38 2900 0.5400 -2.1879 -3.1551 0.7175 0.9673 -564.3220 -487.3477 2.0890 1.6062
0.4956 0.39 3000 0.5285 -1.8388 -2.7165 0.7285 0.8777 -520.4593 -452.4431 1.6464 1.1679
0.4572 0.41 3100 0.5198 -1.6639 -2.4269 0.7265 0.7630 -491.4958 -434.9505 1.7627 1.2994
0.4962 0.42 3200 0.5181 -1.6914 -2.5214 0.7265 0.8300 -500.9511 -437.6994 1.6452 1.1780
0.6098 0.43 3300 0.5188 -1.6044 -2.4380 0.7310 0.8336 -492.6022 -428.9995 1.5141 1.0617
0.5349 0.44 3400 0.5210 -1.4720 -2.3090 0.7285 0.8370 -479.7061 -415.7578 1.4965 1.0371
0.4773 0.46 3500 0.5206 -1.4425 -2.2285 0.7280 0.7861 -471.6597 -412.8062 1.8090 1.3264
0.5312 0.47 3600 0.5196 -1.8128 -2.6719 0.7320 0.8591 -515.9943 -449.8387 2.5339 2.0191
0.5879 0.48 3700 0.5128 -1.9225 -2.7975 0.7355 0.8750 -528.5556 -460.8123 2.9390 2.3934
0.5202 0.5 3800 0.5155 -1.8291 -2.7153 0.7330 0.8863 -520.3419 -451.4667 2.2728 1.7445
0.5116 0.51 3900 0.5188 -2.0732 -3.0427 0.7285 0.9696 -553.0799 -475.8752 2.2721 1.7291
0.5521 0.52 4000 0.5161 -2.3283 -3.3054 0.7255 0.9771 -579.3469 -501.3872 2.2577 1.7449
0.5107 0.54 4100 0.5197 -1.8192 -2.7348 0.7215 0.9156 -522.2897 -450.4803 1.7678 1.2222
0.4773 0.55 4200 0.5163 -2.1894 -3.1554 0.7265 0.9660 -564.3451 -487.4992 1.8497 1.3121
0.4315 0.56 4300 0.5097 -2.0873 -3.0416 0.7340 0.9544 -552.9705 -477.2872 2.2039 1.6783
0.5176 0.58 4400 0.5097 -2.2486 -3.2409 0.7290 0.9924 -572.8979 -493.4146 2.0782 1.5387
0.4487 0.59 4500 0.5132 -2.0257 -3.0144 0.7245 0.9887 -550.2475 -471.1282 2.0676 1.4968
0.478 0.6 4600 0.5082 -2.0565 -3.0343 0.7270 0.9778 -552.2376 -474.2084 2.1065 1.5402
0.5351 0.62 4700 0.5038 -1.9625 -2.8993 0.7285 0.9368 -538.7390 -464.8120 2.0488 1.5017
0.4942 0.63 4800 0.5058 -2.2570 -3.2479 0.7305 0.9909 -573.5954 -494.2575 2.5210 1.9471
0.4918 0.64 4900 0.5129 -2.4781 -3.5322 0.7350 1.0541 -602.0275 -516.3653 2.8295 2.2468
0.4693 0.65 5000 0.5131 -2.2974 -3.3589 0.7315 1.0615 -584.6987 -498.2968 2.6931 2.1137
0.5796 0.67 5100 0.5084 -2.1485 -3.1709 0.7300 1.0224 -565.8975 -483.4113 2.4925 1.9365
0.5137 0.68 5200 0.5012 -2.0083 -2.9370 0.7365 0.9287 -542.5073 -469.3903 2.0969 1.5738
0.4484 0.69 5300 0.5022 -2.1149 -3.0765 0.7345 0.9616 -556.4618 -480.0531 2.2539 1.7154
0.4608 0.71 5400 0.5035 -2.1639 -3.1586 0.7380 0.9947 -564.6663 -484.9485 2.2224 1.6704
0.5746 0.72 5500 0.5045 -2.3599 -3.4023 0.7320 1.0424 -589.0370 -504.5520 2.2134 1.6562
0.5768 0.73 5600 0.5011 -2.0662 -3.0430 0.7375 0.9767 -553.1031 -475.1830 1.8199 1.2667
0.4359 0.75 5700 0.5032 -2.0933 -3.1100 0.7350 1.0166 -559.8049 -477.8932 1.9073 1.3503
0.4812 0.76 5800 0.5056 -2.2931 -3.3640 0.7320 1.0709 -585.2068 -497.8671 2.1234 1.5508
0.5048 0.77 5900 0.5036 -1.9424 -2.9286 0.7335 0.9862 -541.6672 -462.8024 1.7970 1.2367
0.4505 0.79 6000 0.5053 -1.9881 -2.9896 0.7330 1.0015 -547.7703 -467.3695 1.9582 1.3843
0.5197 0.8 6100 0.5071 -2.0238 -3.0391 0.7315 1.0152 -552.7153 -470.9445 2.0118 1.4341
0.6046 0.81 6200 0.5064 -2.0803 -3.1116 0.7310 1.0313 -559.9708 -476.5939 2.1151 1.5328
0.4669 0.82 6300 0.5072 -2.1010 -3.1541 0.7310 1.0531 -564.2192 -478.6570 2.2264 1.6394
0.5631 0.84 6400 0.5055 -2.0938 -3.1385 0.7305 1.0447 -562.6528 -477.9385 2.3072 1.7230
0.433 0.85 6500 0.5044 -2.0630 -3.0936 0.7290 1.0306 -558.1638 -474.8586 2.2760 1.6963
0.4908 0.86 6600 0.5043 -2.0569 -3.0863 0.7295 1.0294 -557.4365 -474.2540 2.3343 1.7557
0.522 0.88 6700 0.5039 -2.0755 -3.1060 0.7300 1.0304 -559.4037 -476.1125 2.3469 1.7706
0.4953 0.89 6800 0.5039 -2.0918 -3.1235 0.7290 1.0317 -561.1605 -477.7388 2.3881 1.8129
0.5683 0.9 6900 0.5036 -2.0899 -3.1203 0.7300 1.0304 -560.8373 -477.5472 2.3649 1.7897
0.5399 0.92 7000 0.5037 -2.0831 -3.1119 0.7295 1.0288 -560.0004 -476.8721 2.3590 1.7832
0.4628 0.93 7100 0.5035 -2.0882 -3.1188 0.7300 1.0307 -560.6896 -477.3761 2.3659 1.7910
0.5273 0.94 7200 0.5036 -2.0897 -3.1202 0.7295 1.0305 -560.8275 -477.5317 2.3594 1.7853
0.4445 0.96 7300 0.5035 -2.0889 -3.1197 0.7305 1.0308 -560.7729 -477.4447 2.3614 1.7871
0.4839 0.97 7400 0.5035 -2.0894 -3.1199 0.7310 1.0304 -560.7961 -477.5042 2.3646 1.7896
0.4425 0.98 7500 0.5036 -2.0892 -3.1197 0.7295 1.0304 -560.7722 -477.4810 2.3638 1.7891
0.5195 0.99 7600 0.5036 -2.0892 -3.1197 0.7295 1.0304 -560.7722 -477.4810 2.3638 1.7891

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train wirthdrew1/zephyr-7b-dpo-qlora