Edit model card

zephyr-7b-dpo-full-beta-0.083

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6981
  • Rewards/chosen: -5.0359
  • Rewards/rejected: -8.8405
  • Rewards/accuracies: 0.7930
  • Rewards/margins: 3.8046
  • Logps/rejected: -345.7131
  • Logps/chosen: -343.6803
  • Logits/rejected: -2.5377
  • Logits/chosen: -2.6128

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6224 0.05 100 0.6037 0.0537 -0.1781 0.7129 0.2318 -241.3471 -282.3603 -3.0281 -3.0500
0.4992 0.1 200 0.5157 0.0022 -0.7572 0.7520 0.7594 -248.3236 -282.9802 -2.9768 -3.0038
0.5334 0.15 300 0.4983 -0.1003 -1.0616 0.7695 0.9613 -251.9916 -284.2161 -3.0036 -3.0272
0.5479 0.21 400 0.4918 -0.1860 -1.2242 0.7656 1.0381 -253.9503 -285.2488 -3.1264 -3.1487
0.531 0.26 500 0.4929 -0.4703 -1.5754 0.7559 1.1051 -258.1821 -288.6731 -3.0860 -3.1247
0.486 0.31 600 0.5096 -0.3559 -1.4424 0.7285 1.0865 -256.5800 -287.2958 -3.0519 -3.0920
0.4858 0.36 700 0.5079 -0.6230 -1.9605 0.7812 1.3375 -262.8217 -290.5133 -2.9452 -2.9906
0.4844 0.41 800 0.4998 -0.7197 -2.1472 0.7559 1.4275 -265.0713 -291.6787 -2.7574 -2.7909
0.4999 0.46 900 0.4983 -0.5951 -1.8837 0.7637 1.2886 -261.8963 -290.1770 -2.9454 -2.9806
0.45 0.52 1000 0.4916 -0.6703 -2.2026 0.7676 1.5323 -265.7383 -291.0830 -2.9158 -2.9444
0.5239 0.57 1100 0.4848 -0.8068 -2.1600 0.7695 1.3532 -265.2255 -292.7281 -2.8454 -2.8788
0.4766 0.62 1200 0.4974 -0.5971 -1.8739 0.7441 1.2769 -261.7786 -290.2007 -2.8189 -2.8673
0.497 0.67 1300 0.5048 -1.0382 -2.4646 0.7266 1.4264 -268.8953 -295.5161 -2.8081 -2.8508
0.5281 0.72 1400 0.5003 -1.0137 -2.1947 0.7559 1.1810 -265.6436 -295.2208 -2.7945 -2.8255
0.4428 0.77 1500 0.4851 -0.8809 -2.3005 0.7598 1.4196 -266.9182 -293.6202 -2.7815 -2.8139
0.5192 0.83 1600 0.4758 -0.9091 -2.3825 0.7539 1.4735 -267.9066 -293.9598 -2.7394 -2.7728
0.533 0.88 1700 0.4753 -0.8150 -2.1835 0.7676 1.3685 -265.5082 -292.8266 -2.8005 -2.8330
0.5803 0.93 1800 0.4854 -0.6814 -2.0356 0.75 1.3542 -263.7262 -291.2166 -2.7118 -2.7542
0.4714 0.98 1900 0.4855 -0.7688 -2.1323 0.7559 1.3634 -264.8912 -292.2704 -2.6864 -2.7287
0.0702 1.03 2000 0.4988 -1.4916 -3.5339 0.7793 2.0423 -281.7779 -300.9782 -2.6172 -2.6670
0.0732 1.08 2100 0.5188 -1.6274 -3.8428 0.7793 2.2154 -285.4998 -302.6147 -2.6360 -2.6881
0.077 1.14 2200 0.5274 -2.1510 -4.2855 0.7812 2.1345 -290.8334 -308.9228 -2.7288 -2.7823
0.0673 1.19 2300 0.5169 -1.7308 -3.9343 0.7832 2.2035 -286.6026 -303.8600 -2.6971 -2.7569
0.1039 1.24 2400 0.5115 -1.7156 -3.7812 0.7715 2.0655 -284.7573 -303.6773 -2.6974 -2.7420
0.0961 1.29 2500 0.5290 -2.3303 -4.5271 0.7734 2.1968 -293.7446 -311.0832 -2.7071 -2.7485
0.1269 1.34 2600 0.5061 -1.8237 -3.7726 0.7695 1.9490 -284.6546 -304.9791 -2.7066 -2.7432
0.0959 1.39 2700 0.5066 -1.8437 -3.9127 0.7793 2.0690 -286.3417 -305.2205 -2.7061 -2.7584
0.1009 1.45 2800 0.5241 -2.4471 -4.6093 0.7852 2.1622 -294.7356 -312.4907 -2.6836 -2.7338
0.0917 1.5 2900 0.5350 -2.4581 -4.5278 0.75 2.0697 -293.7532 -312.6228 -2.7069 -2.7588
0.0693 1.55 3000 0.5371 -2.3570 -4.5566 0.7578 2.1996 -294.1000 -311.4046 -2.7179 -2.7642
0.0861 1.6 3100 0.5141 -2.1264 -4.2158 0.7754 2.0894 -289.9940 -308.6270 -2.7429 -2.8104
0.0851 1.65 3200 0.5175 -1.9273 -4.0951 0.7695 2.1678 -288.5394 -306.2276 -2.6925 -2.7584
0.0837 1.7 3300 0.5354 -2.0696 -4.3985 0.7637 2.3289 -292.1949 -307.9421 -2.6726 -2.7440
0.056 1.76 3400 0.5596 -2.7840 -5.3198 0.7734 2.5358 -303.2956 -316.5497 -2.6498 -2.7202
0.0689 1.81 3500 0.5348 -2.3076 -4.5718 0.7812 2.2642 -294.2832 -310.8093 -2.7109 -2.7732
0.0934 1.86 3600 0.5539 -2.6736 -5.0332 0.7734 2.3596 -299.8421 -315.2191 -2.6534 -2.7272
0.0694 1.91 3700 0.5426 -2.6655 -4.9512 0.7695 2.2857 -298.8542 -315.1215 -2.6730 -2.7433
0.1267 1.96 3800 0.5620 -2.8767 -5.0299 0.7910 2.1531 -299.8019 -317.6664 -2.6778 -2.7430
0.024 2.01 3900 0.5618 -2.9659 -5.5768 0.7832 2.6109 -306.3921 -318.7414 -2.6526 -2.7240
0.0171 2.07 4000 0.6117 -3.6584 -6.6949 0.7793 3.0364 -319.8622 -327.0849 -2.6017 -2.6789
0.0112 2.12 4100 0.6536 -3.8851 -7.0803 0.7734 3.1953 -324.5066 -329.8155 -2.6007 -2.6772
0.0123 2.17 4200 0.6296 -3.5916 -6.6239 0.7734 3.0323 -319.0072 -326.2793 -2.6019 -2.6741
0.0135 2.22 4300 0.6245 -3.6464 -6.7754 0.7832 3.1289 -320.8321 -326.9404 -2.5877 -2.6570
0.0147 2.27 4400 0.6659 -4.4576 -7.8315 0.7832 3.3739 -333.5571 -336.7133 -2.5400 -2.6114
0.0193 2.32 4500 0.6365 -4.0338 -7.4212 0.7832 3.3874 -328.6134 -331.6075 -2.4882 -2.5622
0.0141 2.37 4600 0.6966 -4.9177 -8.5470 0.7930 3.6293 -342.1769 -342.2570 -2.4891 -2.5649
0.0126 2.43 4700 0.6972 -4.9634 -8.5921 0.7949 3.6287 -342.7202 -342.8073 -2.4465 -2.5246
0.0092 2.48 4800 0.6804 -4.6987 -8.2494 0.7832 3.5507 -338.5913 -339.6177 -2.4977 -2.5738
0.0232 2.53 4900 0.6465 -4.1657 -7.5350 0.7812 3.3694 -329.9847 -333.1960 -2.5170 -2.5959
0.0121 2.58 5000 0.6718 -4.7636 -8.3913 0.7910 3.6278 -340.3017 -340.3996 -2.5250 -2.6042
0.0104 2.63 5100 0.6863 -4.7726 -8.4937 0.7930 3.7212 -341.5356 -340.5081 -2.5020 -2.5831
0.0127 2.68 5200 0.7056 -5.2268 -9.0672 0.7910 3.8404 -348.4451 -345.9808 -2.5054 -2.5842
0.0057 2.74 5300 0.6886 -4.8479 -8.6269 0.7949 3.7790 -343.1393 -341.4157 -2.5488 -2.6248
0.0132 2.79 5400 0.6839 -4.7008 -8.4009 0.7930 3.7001 -340.4170 -339.6432 -2.5501 -2.6260
0.0103 2.84 5500 0.6880 -4.8373 -8.5695 0.7969 3.7322 -342.4483 -341.2881 -2.5405 -2.6167
0.0105 2.89 5600 0.6968 -5.0538 -8.8490 0.7852 3.7952 -345.8162 -343.8970 -2.5383 -2.6136
0.008 2.94 5700 0.6993 -5.0988 -8.9206 0.7871 3.8218 -346.6779 -344.4387 -2.5373 -2.6125
0.0047 2.99 5800 0.6975 -5.0353 -8.8422 0.7949 3.8069 -345.7339 -343.6734 -2.5373 -2.6125

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.0+cu118
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
0
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Finetuned from