Edit model card

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0254
  • Rewards/chosen: -6.9910
  • Rewards/rejected: -27.5001
  • Rewards/accuracies: 0.9806
  • Rewards/margins: 20.5091
  • Logps/rejected: -345.1052
  • Logps/chosen: -161.1468
  • Logits/rejected: -1.1372
  • Logits/chosen: -1.3943

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.3601 0.05 100 0.3410 -0.0714 -1.0669 0.9444 0.9955 -80.7736 -91.9507 -3.0201 -3.0686
0.113 0.09 200 0.1171 -1.2736 -4.8848 0.9611 3.6112 -118.9524 -103.9727 -3.0104 -3.0297
0.0734 0.14 300 0.0768 -2.2058 -7.2581 0.9778 5.0523 -142.6854 -113.2948 -2.9044 -2.9330
0.0587 0.18 400 0.0559 -4.0388 -11.7401 0.9694 7.7012 -187.5052 -131.6255 -2.6966 -2.7279
0.0379 0.23 500 0.0460 -3.9501 -12.2735 0.9750 8.3233 -192.8393 -130.7384 -2.6613 -2.7017
0.0394 0.27 600 0.0405 -4.8526 -14.4947 0.9750 9.6420 -215.0514 -139.7636 -2.3901 -2.4786
0.0375 0.32 700 0.0376 -3.7095 -12.6204 0.9750 8.9110 -196.3089 -128.3319 -2.6708 -2.7142
0.043 0.37 800 0.0375 -5.2652 -14.8540 0.9694 9.5888 -218.6447 -143.8889 -2.1654 -2.2829
0.0304 0.41 900 0.0380 -4.2196 -13.7402 0.9750 9.5207 -207.5071 -133.4332 -2.1041 -2.2015
0.0254 0.46 1000 0.0324 -4.9741 -16.1886 0.9722 11.2145 -231.9906 -140.9781 -2.0860 -2.1849
0.03 0.5 1100 0.0342 -5.1530 -16.4077 0.9667 11.2547 -234.1812 -142.7671 -2.1132 -2.1749
0.0339 0.55 1200 0.0311 -2.7641 -11.9997 0.9750 9.2356 -190.1020 -118.8785 -2.0003 -2.1125
0.0489 0.59 1300 0.0272 -2.8374 -12.4331 0.9833 9.5957 -194.4359 -119.6116 -2.1739 -2.2572
0.0263 0.64 1400 0.0291 -3.4940 -13.2533 0.9833 9.7593 -202.6373 -126.1769 -2.0814 -2.2103
0.0301 0.68 1500 0.0266 -4.7000 -16.2144 0.9778 11.5144 -232.2482 -138.2372 -2.0580 -2.1682
0.0272 0.73 1600 0.0283 -6.1020 -18.1286 0.9667 12.0266 -251.3911 -152.2577 -1.8091 -1.9654
0.0278 0.78 1700 0.0254 -2.9913 -13.0756 0.9750 10.0843 -200.8606 -121.1501 -2.1657 -2.2443
0.0291 0.82 1800 0.0260 -4.8989 -16.4318 0.9722 11.5329 -234.4226 -140.2258 -1.9427 -2.0364
0.0253 0.87 1900 0.0252 -4.0403 -15.3217 0.9778 11.2813 -223.3212 -131.6404 -1.9099 -2.0361
0.0235 0.91 2000 0.0223 -3.1973 -14.8592 0.9750 11.6619 -218.6964 -123.2101 -2.0448 -2.1573
0.0272 0.96 2100 0.0236 -3.5828 -15.3281 0.9750 11.7453 -223.3855 -127.0649 -1.9834 -2.0892
0.0107 1.0 2200 0.0206 -2.6028 -14.3743 0.9806 11.7715 -213.8473 -117.2654 -2.1607 -2.2658
0.0089 1.05 2300 0.0209 -4.2378 -18.4889 0.9806 14.2511 -254.9935 -133.6152 -1.8951 -2.0551
0.0103 1.1 2400 0.0222 -3.8267 -16.7762 0.9778 12.9495 -237.8671 -129.5042 -2.1516 -2.2577
0.0114 1.14 2500 0.0230 -5.6573 -20.9027 0.9750 15.2454 -279.1315 -147.8097 -1.6603 -1.8552
0.0075 1.19 2600 0.0217 -4.9252 -19.3893 0.9833 14.4641 -263.9972 -140.4892 -1.8500 -2.0172
0.0084 1.23 2700 0.0235 -5.3971 -20.1630 0.9806 14.7660 -271.7348 -145.2077 -1.8673 -2.0178
0.0094 1.28 2800 0.0253 -4.4422 -18.2275 0.9778 13.7853 -252.3796 -135.6592 -2.0417 -2.1522
0.0064 1.32 2900 0.0269 -4.3177 -18.7842 0.9750 14.4665 -257.9463 -134.4144 -2.0273 -2.1416
0.0093 1.37 3000 0.0234 -4.6335 -19.3959 0.9722 14.7624 -264.0636 -137.5719 -1.9133 -2.0781
0.0083 1.41 3100 0.0230 -5.0798 -20.1230 0.9806 15.0431 -271.3341 -142.0356 -1.7624 -1.9098
0.0108 1.46 3200 0.0217 -3.8920 -18.1320 0.9806 14.2400 -251.4249 -130.1573 -1.9500 -2.0745
0.0091 1.51 3300 0.0223 -5.6969 -21.8594 0.9806 16.1624 -288.6983 -148.2067 -1.9011 -2.0357
0.012 1.55 3400 0.0215 -4.8539 -18.2010 0.9861 13.3471 -252.1147 -139.7764 -1.7467 -1.8729
0.0064 1.6 3500 0.0218 -5.0782 -19.6405 0.9833 14.5623 -266.5100 -142.0197 -1.7721 -1.8976
0.0078 1.64 3600 0.0213 -5.1302 -20.2433 0.9778 15.1130 -272.5372 -142.5395 -1.5605 -1.7238
0.0076 1.69 3700 0.0222 -5.6579 -22.5669 0.9778 16.9089 -295.7735 -147.8167 -1.4705 -1.6559
0.0124 1.73 3800 0.0221 -3.5477 -16.1322 0.9861 12.5845 -231.4264 -126.7141 -1.6382 -1.7606
0.0065 1.78 3900 0.0201 -5.0615 -20.0072 0.9806 14.9457 -270.1767 -141.8525 -1.5657 -1.7132
0.0078 1.83 4000 0.0200 -5.5288 -20.5404 0.9806 15.0115 -275.5083 -146.5255 -1.6180 -1.7627
0.0089 1.87 4100 0.0210 -5.5847 -22.2109 0.9806 16.6262 -292.2132 -147.0840 -1.6373 -1.7998
0.0022 1.92 4200 0.0206 -5.2113 -21.4282 0.9833 16.2169 -284.3861 -143.3500 -1.5687 -1.7379
0.0065 1.96 4300 0.0205 -4.4336 -18.8360 0.9806 14.4024 -258.4650 -135.5732 -1.5889 -1.7431
0.0038 2.01 4400 0.0213 -3.8031 -17.3943 0.9833 13.5912 -244.0473 -129.2682 -1.6138 -1.7573
0.0042 2.05 4500 0.0210 -4.7797 -20.6519 0.9833 15.8722 -276.6233 -139.0337 -1.4571 -1.6393
0.0025 2.1 4600 0.0220 -5.3030 -22.1802 0.9833 16.8772 -291.9064 -144.2672 -1.4136 -1.6114
0.0013 2.15 4700 0.0240 -6.9006 -26.5257 0.9806 19.6251 -335.3617 -160.2434 -1.3102 -1.5345
0.0075 2.19 4800 0.0253 -6.6613 -26.8086 0.9778 20.1473 -338.1903 -157.8497 -1.2594 -1.4933
0.0024 2.24 4900 0.0238 -5.9384 -25.2963 0.9806 19.3579 -323.0674 -150.6208 -1.3039 -1.5318
0.0017 2.28 5000 0.0217 -5.7146 -23.2521 0.9806 17.5375 -302.6257 -148.3833 -1.3104 -1.5277
0.0037 2.33 5100 0.0234 -6.8711 -26.8591 0.9806 19.9880 -338.6961 -159.9486 -1.2368 -1.4798
0.0045 2.37 5200 0.0233 -6.4563 -25.4117 0.9806 18.9554 -324.2215 -155.7999 -1.2791 -1.5095
0.0024 2.42 5300 0.0226 -6.0528 -23.6960 0.9806 17.6432 -307.0642 -151.7647 -1.2664 -1.4908
0.0023 2.46 5400 0.0233 -6.6659 -25.6664 0.9806 19.0004 -326.7684 -157.8966 -1.2361 -1.4788
0.0007 2.51 5500 0.0250 -6.7756 -26.1454 0.9806 19.3698 -331.5591 -158.9936 -1.2610 -1.4948
0.0029 2.56 5600 0.0246 -6.6468 -25.9933 0.9806 19.3465 -330.0380 -157.7054 -1.3105 -1.5473
0.0017 2.6 5700 0.0248 -6.9095 -26.0328 0.9806 19.1234 -330.4331 -160.3320 -1.2445 -1.4880
0.0055 2.65 5800 0.0257 -7.5666 -27.8213 0.9806 20.2547 -348.3180 -166.9033 -1.1526 -1.4106
0.0036 2.69 5900 0.0263 -7.6427 -28.5222 0.9806 20.8795 -355.3266 -167.6644 -1.0877 -1.3563
0.0061 2.74 6000 0.0242 -6.7938 -26.3053 0.9806 19.5115 -333.1572 -159.1749 -1.2176 -1.4633
0.0018 2.78 6100 0.0242 -6.8675 -26.5397 0.9806 19.6722 -335.5020 -159.9126 -1.1958 -1.4429
0.0024 2.83 6200 0.0249 -7.1136 -27.2224 0.9806 20.1088 -342.3287 -162.3729 -1.1466 -1.4003
0.0026 2.88 6300 0.0251 -6.9595 -27.0923 0.9806 20.1328 -341.0272 -160.8322 -1.1501 -1.4025
0.0034 2.92 6400 0.0253 -7.1263 -27.7050 0.9806 20.5787 -347.1544 -162.5005 -1.1288 -1.3867
0.0023 2.97 6500 0.0253 -7.0828 -27.6979 0.9806 20.6151 -347.0838 -162.0652 -1.1284 -1.3867

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Finetuned from