Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of dball/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5058
  • Rewards/chosen: -2.0144
  • Rewards/rejected: -3.0238
  • Rewards/accuracies: 0.7350
  • Rewards/margins: 1.0093
  • Logps/rejected: -550.9584
  • Logps/chosen: -469.9345
  • Logits/rejected: 1.9679
  • Logits/chosen: 1.2121

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6934 0.01 100 -2.5261 -2.4383 -268.4692 -248.5731 0.6931 0.5105 0.0002 0.0001 0.0001
0.6924 0.03 200 -2.5247 -2.4368 -268.3451 -248.5511 0.6926 0.5605 0.0014 0.0011 0.0003
0.691 0.04 300 -2.5253 -2.4378 -267.5839 -248.1753 0.6907 0.6440 0.0091 0.0050 0.0041
0.6876 0.05 400 -2.5230 -2.4351 -264.4353 -246.3089 0.6845 0.6580 0.0405 0.0178 0.0227
0.6799 0.07 500 -2.4660 -2.3755 -264.9495 -249.9276 0.6707 0.6815 0.0354 0.0489 -0.0135
0.6577 0.08 600 -2.3601 -2.2541 -280.7885 -272.3604 0.6462 0.6750 -0.1230 0.1148 -0.2378
0.6365 0.09 700 -2.3136 -2.2013 -277.0453 -272.2037 0.6345 0.6860 -0.0856 0.1507 -0.2362
0.6519 0.1 800 -2.1835 -2.0482 -317.9223 -320.8872 0.6240 0.6630 -0.4943 0.2287 -0.7231
0.6547 0.12 900 -2.2184 -2.0783 -325.8177 -331.4542 0.6203 0.6695 -0.5733 0.2555 -0.8287
0.5841 0.13 1000 -2.2086 -2.0689 -322.0998 -334.5816 0.6071 0.6820 -0.5361 0.3239 -0.8600
0.5877 0.14 1100 -1.3836 -1.1053 -383.4380 -410.8678 0.5947 0.6855 -1.1495 0.4734 -1.6229
0.5552 0.16 1200 -0.7372 -0.3614 -411.0459 -437.9200 0.5909 0.6880 -1.4256 0.4678 -1.8934
0.5492 0.17 1300 -0.5949 -0.1933 -414.6323 -446.2910 0.5791 0.6935 -1.4614 0.5157 -1.9771
0.5789 0.18 1400 -0.5846 -0.1908 -356.4832 -384.9109 0.5771 0.7035 -0.8799 0.4834 -1.3633
0.5456 0.2 1500 -0.1574 0.3098 -386.9436 -427.7158 0.5646 0.7035 -1.1845 0.6068 -1.7913
0.4722 0.21 1600 0.0346 0.5395 -400.9113 -442.8174 0.5598 0.7075 -1.3242 0.6181 -1.9424
0.5072 0.22 1700 0.4657 1.0411 -418.8860 -465.2537 0.5574 0.7060 -1.5040 0.6628 -2.1667
0.5284 0.24 1800 0.6528 1.2404 -423.3542 -469.1293 0.5534 0.7070 -1.5486 0.6568 -2.2055
0.5623 0.25 1900 0.3058 0.7808 -439.5539 -491.0526 0.5625 0.7055 -1.7106 0.7141 -2.4247
0.6092 0.26 2000 0.0079 0.5199 -370.0728 -413.7089 0.5501 0.7085 -1.0158 0.6354 -1.6513
0.5726 0.27 2100 0.4405 0.9981 -415.4569 -464.3842 0.5433 0.7150 -1.4697 0.6884 -2.1580
0.5323 0.29 2200 0.7445 1.3533 -400.2244 -457.4451 0.5483 0.7150 -1.3173 0.7713 -2.0886
0.5148 0.3 2300 0.5107 1.1454 -400.4308 -450.4646 0.5387 0.7275 -1.3194 0.6994 -2.0188
0.4112 0.31 2400 0.6648 1.2866 -430.5040 -490.7723 0.5401 0.7200 -1.6201 0.8018 -2.4219
0.5246 0.33 2500 1.0914 1.7388 -481.2729 -538.2222 0.5413 0.7220 -2.1278 0.7686 -2.8964
0.5657 0.34 2600 0.9886 1.6571 -437.1172 -495.0003 0.5373 0.7200 -1.6863 0.7779 -2.4642
0.5216 0.35 2700 1.1290 1.7936 -467.4365 -522.5278 0.5357 0.7260 -1.9895 0.7500 -2.7395
0.5865 0.37 2800 1.1019 1.7565 -478.5605 -529.6149 0.5351 0.7260 -2.1007 0.7096 -2.8103
0.5252 0.38 2900 0.9108 1.5686 -426.6496 -492.7397 0.5376 0.7205 -1.5816 0.8600 -2.4416
0.5381 0.39 3000 1.0233 1.7206 -422.6485 -485.7741 0.5306 0.7230 -1.5416 0.8303 -2.3719
0.4587 0.41 3100 1.1221 1.8445 -413.6005 -467.0778 0.5222 0.7260 -1.4511 0.7339 -2.1850
0.5173 0.42 3200 0.8981 1.6186 -403.9989 -462.4095 0.5277 0.7260 -1.3551 0.7832 -2.1383
0.5851 0.43 3300 1.2860 2.0344 -437.1258 -498.6931 0.5181 0.7325 -1.6864 0.8148 -2.5011
0.5811 0.44 3400 1.0162 1.7238 -428.5590 -492.4408 0.5166 0.7335 -1.6007 0.8379 -2.4386
0.4892 0.46 3500 1.3014 2.0709 -415.6104 -480.9519 0.5257 0.7280 -1.4712 0.8525 -2.3237
0.5438 0.47 3600 1.4150 2.2020 -428.1592 -493.0664 0.5252 0.7275 -1.5967 0.8482 -2.4449
0.5677 0.48 3700 1.6843 2.4678 -465.7504 -529.8630 0.5152 0.7275 -1.9726 0.8402 -2.8128
0.5471 0.5 3800 1.4352 2.2022 -475.7978 -551.5833 0.5240 0.7255 -2.0731 0.9569 -3.0300
0.5193 0.51 3900 1.3990 2.1469 -485.6194 -559.7596 0.5185 0.7340 -2.1713 0.9405 -3.1118
0.5764 0.52 4000 1.1192 1.8653 -469.0576 -545.9298 0.5177 0.7310 -2.0057 0.9678 -2.9735
0.504 0.54 4100 1.0344 1.7948 -450.8565 -523.1135 0.5180 0.7270 -1.8237 0.9217 -2.7453
0.4846 0.55 4200 1.3329 2.1064 -480.6317 -553.0635 0.5168 0.7260 -2.1214 0.9234 -3.0448
0.426 0.56 4300 1.2900 2.0377 -469.9074 -543.4855 0.5096 0.7325 -2.0142 0.9349 -2.9490
0.5289 0.58 4400 1.0286 1.7669 -464.7332 -542.2659 0.5143 0.7260 -1.9624 0.9744 -2.9368
0.4542 0.59 4500 1.1395 1.8775 -464.9223 -541.3861 0.5102 0.7335 -1.9643 0.9637 -2.9280
0.4839 0.6 4600 1.1472 1.8858 -468.8564 -546.4150 0.5094 0.7305 -2.0037 0.9747 -2.9783
0.5562 0.62 4700 1.1999 1.9384 -471.0873 -546.7677 0.5076 0.7340 -2.0260 0.9559 -2.9819
0.4964 0.63 4800 1.3968 2.1538 -485.7305 -561.4290 0.5078 0.7335 -2.1724 0.9561 -3.1285
0.4879 0.64 4900 1.3802 2.1324 -489.5623 -571.5599 0.5125 0.7310 -2.2107 1.0191 -3.2298
0.4916 0.65 5000 1.3780 2.1161 -478.1451 -558.6430 0.5087 0.7300 -2.0966 1.0041 -3.1006
0.5806 0.67 5100 1.3595 2.0897 -491.2838 -572.3604 0.5089 0.7305 -2.2279 1.0099 -3.2378
0.5027 0.68 5200 1.0714 1.8014 -458.1095 -531.8434 0.5038 0.7375 -1.8962 0.9364 -2.8326
0.4554 0.69 5300 1.1555 1.8905 -463.9870 -540.6600 0.5052 0.7330 -1.9550 0.9658 -2.9208
0.4521 0.71 5400 1.1076 1.8437 -467.6124 -543.2982 0.5039 0.7370 -1.9912 0.9559 -2.9472
0.5869 0.72 5500 1.1574 1.8865 -485.5281 -564.9521 0.5054 0.7360 -2.1704 0.9933 -3.1637
0.5924 0.73 5600 0.8215 1.5325 -450.2935 -527.0139 0.5064 0.7320 -1.8180 0.9663 -2.7843
0.4275 0.75 5700 0.9960 1.7229 -469.1932 -549.8819 0.5055 0.7340 -2.0070 1.0060 -3.0130
0.4746 0.76 5800 1.1168 1.8507 -489.1825 -573.2806 0.5072 0.7300 -2.2069 1.0401 -3.2470
0.5033 0.77 5900 0.9675 1.7071 -458.1062 -536.0162 0.5061 0.7275 -1.8962 0.9782 -2.8744
0.4517 0.79 6000 0.8156 1.5613 -441.7279 -516.7132 0.5105 0.7265 -1.7324 0.9489 -2.6813
0.5071 0.8 6100 0.9370 1.6895 -454.8272 -534.7506 0.5116 0.7275 -1.8634 0.9983 -2.8617
0.6455 0.81 6200 0.9542 1.7120 -456.4508 -536.0126 0.5110 0.7250 -1.8796 0.9947 -2.8743
0.4796 0.82 6300 1.0203 1.7784 -460.9879 -543.0519 0.5112 0.7260 -1.9250 1.0197 -2.9447
0.5568 0.84 6400 1.1152 1.8764 -463.8810 -545.5328 0.5086 0.7275 -1.9539 1.0156 -2.9695
0.4335 0.85 6500 1.1822 1.9425 -468.9681 -550.4982 0.5067 0.7295 -2.0048 1.0144 -3.0192
0.5263 0.86 6600 1.1806 1.9390 -465.3099 -546.2759 0.5066 0.7310 -1.9682 1.0087 -2.9769
0.5263 0.88 6700 1.1794 1.9366 -465.6784 -546.6119 0.5066 0.7320 -1.9719 1.0084 -2.9803
0.4939 0.89 6800 1.2238 1.9795 -470.5374 -551.8629 0.5063 0.7325 -2.0205 1.0123 -3.0328
0.5763 0.9 6900 1.2027 1.9579 -469.4713 -550.4863 0.5060 0.7330 -2.0098 1.0092 -3.0191
0.5062 0.92 7000 1.2018 1.9574 -468.7946 -549.6514 0.5059 0.7320 -2.0030 1.0077 -3.0107
0.4432 0.93 7100 1.2115 1.9675 -469.8141 -550.7594 0.5059 0.7330 -2.0132 1.0085 -3.0218
0.5294 0.94 7200 1.2123 1.9679 -469.9014 -550.8820 0.5059 0.7315 -2.0141 1.0089 -3.0230
0.4488 0.96 7300 1.2130 1.9688 -469.9289 -550.9682 0.5058 0.7320 -2.0144 1.0095 -3.0239
0.4747 0.97 7400 1.2122 1.9679 -469.9052 -550.9178 0.5057 0.7325 -2.0142 1.0092 -3.0234
0.4494 0.98 7500 1.2121 1.9679 -469.9345 -550.9584 0.5058 0.7350 -2.0144 1.0093 -3.0238
0.5319 0.99 7600 1.2121 1.9679 -469.9345 -550.9584 0.5058 0.7350 -2.0144 1.0093 -3.0238

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
7
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for dball/zephyr-7b-dpo-qlora

Adapter
(1172)
this model

Dataset used to train dball/zephyr-7b-dpo-qlora