Edit model card

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7337
  • Rewards/chosen: -4.9100
  • Rewards/rejected: -8.6806
  • Rewards/accuracies: 0.7720
  • Rewards/margins: 3.7705
  • Logps/rejected: -315.2896
  • Logps/chosen: -320.2513
  • Logits/rejected: -2.5449
  • Logits/chosen: -2.5953

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6144 0.05 100 0.5938 0.0567 -0.2214 0.7220 0.2780 -230.6976 -270.5843 -3.0045 -3.0186
0.4957 0.1 200 0.5132 0.0606 -0.7482 0.7460 0.8088 -235.9661 -270.5448 -2.9556 -2.9714
0.5257 0.15 300 0.4975 -0.0361 -1.0262 0.7520 0.9901 -238.7455 -271.5117 -2.9853 -2.9989
0.556 0.21 400 0.4935 -0.1016 -1.1994 0.7760 1.0978 -240.4776 -272.1671 -3.0847 -3.0931
0.5409 0.26 500 0.4953 -0.4001 -1.5875 0.7780 1.1874 -244.3592 -275.1525 -3.0544 -3.0767
0.5161 0.31 600 0.5195 -0.3148 -1.4151 0.7420 1.1003 -242.6347 -274.2988 -3.0235 -3.0461
0.4913 0.36 700 0.5228 -0.5853 -1.8669 0.7800 1.2816 -247.1535 -277.0044 -2.9302 -2.9586
0.4724 0.41 800 0.5142 -0.6071 -2.0565 0.7620 1.4494 -249.0490 -277.2221 -2.7988 -2.8297
0.5157 0.46 900 0.5050 -0.5865 -1.8166 0.7660 1.2302 -246.6503 -277.0157 -2.9463 -2.9778
0.4641 0.52 1000 0.5091 -0.5151 -1.9977 0.7580 1.4826 -248.4611 -276.3019 -2.8916 -2.9216
0.5558 0.57 1100 0.4971 -0.8116 -2.1120 0.7700 1.3004 -249.6036 -279.2668 -2.8601 -2.8914
0.4877 0.62 1200 0.5092 -0.5596 -1.8948 0.7640 1.3352 -247.4319 -276.7474 -2.8340 -2.8770
0.4922 0.67 1300 0.5181 -0.9340 -2.3745 0.7460 1.4405 -252.2287 -280.4910 -2.8187 -2.8517
0.5515 0.72 1400 0.5081 -0.9873 -2.2119 0.7440 1.2247 -250.6034 -281.0239 -2.8488 -2.8704
0.4349 0.77 1500 0.4996 -0.9048 -2.4262 0.7580 1.5214 -252.7459 -280.1994 -2.8402 -2.8601
0.5446 0.83 1600 0.4927 -0.8717 -2.4390 0.7660 1.5673 -252.8737 -279.8681 -2.7610 -2.7853
0.5242 0.88 1700 0.4864 -0.6984 -2.1381 0.7780 1.4397 -249.8655 -278.1355 -2.8269 -2.8525
0.5266 0.93 1800 0.5020 -0.5411 -1.9479 0.7760 1.4068 -247.9628 -276.5621 -2.7381 -2.7715
0.498 0.98 1900 0.5086 -0.6894 -2.0331 0.7640 1.3437 -248.8150 -278.0452 -2.7298 -2.7664
0.0664 1.03 2000 0.5137 -1.1702 -3.1723 0.7620 2.0021 -260.2072 -282.8530 -2.6137 -2.6605
0.0698 1.08 2100 0.5327 -1.3645 -3.5669 0.7680 2.2023 -264.1527 -284.7966 -2.6219 -2.6692
0.0715 1.14 2200 0.5423 -2.0519 -4.1983 0.7620 2.1464 -270.4673 -291.6701 -2.6949 -2.7397
0.0548 1.19 2300 0.5459 -1.7539 -4.0546 0.7700 2.3007 -269.0301 -288.6898 -2.5996 -2.6425
0.0897 1.24 2400 0.5317 -1.6549 -3.7228 0.7640 2.0679 -265.7117 -287.7002 -2.6512 -2.6870
0.0842 1.29 2500 0.5710 -2.3000 -4.5267 0.7660 2.2267 -273.7511 -294.1512 -2.6530 -2.6843
0.1321 1.34 2600 0.5334 -1.8238 -3.8561 0.75 2.0323 -267.0450 -289.3895 -2.7094 -2.7343
0.0862 1.39 2700 0.5443 -1.8480 -3.9514 0.7520 2.1034 -267.9976 -289.6307 -2.6953 -2.7169
0.0954 1.45 2800 0.5472 -1.9317 -3.9982 0.7620 2.0665 -268.4658 -290.4683 -2.6900 -2.7121
0.0979 1.5 2900 0.5471 -2.1452 -4.1979 0.7540 2.0526 -270.4626 -292.6034 -2.6466 -2.6788
0.0732 1.55 3000 0.5512 -2.0252 -4.2019 0.75 2.1767 -270.5027 -291.4029 -2.6716 -2.6981
0.0799 1.6 3100 0.5415 -1.8888 -3.8739 0.75 1.9851 -267.2229 -290.0393 -2.6703 -2.7143
0.07 1.65 3200 0.5399 -1.8457 -4.0299 0.7640 2.1843 -268.7833 -289.6078 -2.6566 -2.7002
0.0808 1.7 3300 0.5594 -2.2307 -4.6355 0.7640 2.4048 -274.8385 -293.4576 -2.6843 -2.7340
0.0501 1.76 3400 0.5704 -2.5155 -4.9551 0.7660 2.4396 -278.0345 -296.3059 -2.6427 -2.6944
0.061 1.81 3500 0.5562 -2.2172 -4.4937 0.7600 2.2765 -273.4208 -293.3234 -2.7086 -2.7404
0.0979 1.86 3600 0.5656 -2.6495 -5.0323 0.7520 2.3828 -278.8068 -297.6461 -2.6381 -2.6765
0.0631 1.91 3700 0.5668 -2.5055 -4.7949 0.7560 2.2895 -276.4331 -296.2057 -2.6407 -2.6818
0.1202 1.96 3800 0.5678 -2.6581 -4.7249 0.7580 2.0668 -275.7330 -297.7322 -2.6716 -2.7125
0.022 2.01 3900 0.5657 -2.6893 -5.1672 0.7720 2.4778 -280.1555 -298.0444 -2.6680 -2.7125
0.0177 2.07 4000 0.6171 -3.3461 -6.2908 0.7680 2.9447 -291.3919 -304.6117 -2.6431 -2.6916
0.0108 2.12 4100 0.6389 -3.3448 -6.3803 0.7660 3.0355 -292.2874 -304.5994 -2.6225 -2.6701
0.0108 2.17 4200 0.6562 -3.5386 -6.6028 0.7620 3.0642 -294.5121 -306.5373 -2.6323 -2.6797
0.0105 2.22 4300 0.6742 -3.7048 -6.8992 0.7560 3.1944 -297.4764 -308.1995 -2.6192 -2.6678
0.018 2.27 4400 0.6982 -4.1642 -7.4837 0.7680 3.3195 -303.3213 -312.7930 -2.5975 -2.6454
0.0173 2.32 4500 0.6661 -3.9139 -6.9481 0.7660 3.0342 -297.9650 -310.2904 -2.5967 -2.6394
0.011 2.37 4600 0.6606 -3.7121 -6.8279 0.7640 3.1158 -296.7630 -308.2721 -2.5628 -2.6068
0.0096 2.43 4700 0.6705 -3.9088 -7.1613 0.7680 3.2524 -300.0965 -310.2393 -2.5127 -2.5613
0.0099 2.48 4800 0.6825 -3.9836 -7.2552 0.7720 3.2716 -301.0364 -310.9875 -2.5169 -2.5658
0.0106 2.53 4900 0.6938 -4.2534 -7.7587 0.7660 3.5053 -306.0710 -313.6849 -2.5330 -2.5844
0.0106 2.58 5000 0.6949 -4.2978 -7.7919 0.7660 3.4942 -306.4034 -314.1288 -2.5330 -2.5826
0.0099 2.63 5100 0.7239 -4.3508 -8.0105 0.7640 3.6598 -308.5892 -314.6587 -2.5095 -2.5620
0.0074 2.68 5200 0.7394 -4.7364 -8.4819 0.7660 3.7456 -313.3035 -318.5147 -2.5378 -2.5891
0.0043 2.74 5300 0.7335 -4.6351 -8.3990 0.7720 3.7639 -312.4740 -317.5019 -2.5539 -2.6052
0.0163 2.79 5400 0.7317 -4.6741 -8.3958 0.7700 3.7217 -312.4420 -317.8924 -2.5490 -2.5993
0.0081 2.84 5500 0.7420 -4.9166 -8.6945 0.7740 3.7779 -315.4291 -320.3167 -2.5307 -2.5816
0.0067 2.89 5600 0.7369 -4.9581 -8.7224 0.7680 3.7643 -315.7077 -320.7321 -2.5437 -2.5941
0.0081 2.94 5700 0.7345 -4.9719 -8.7499 0.7720 3.7780 -315.9826 -320.8700 -2.5442 -2.5946
0.0043 2.99 5800 0.7338 -4.9141 -8.6850 0.7700 3.7709 -315.3341 -320.2925 -2.5452 -2.5956

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.0
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
2
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Finetuned from