Edit model card

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1487
  • Rewards/chosen: -10.8742
  • Rewards/rejected: -16.0045
  • Rewards/accuracies: 0.7285
  • Rewards/margins: 5.1303
  • Logps/rejected: -424.4627
  • Logps/chosen: -383.6538
  • Logits/rejected: -0.5906
  • Logits/chosen: -1.0023

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6673 0.0523 100 0.6670 0.0699 0.0097 0.6797 0.0602 -264.3204 -274.2128 -2.5742 -2.6289
0.5806 0.1047 200 0.5926 0.3801 -0.0108 0.7051 0.3909 -264.5256 -271.1104 -2.5225 -2.5806
0.554 0.1570 300 0.5669 0.3096 -0.4486 0.7246 0.7581 -268.9032 -271.8162 -2.4975 -2.5603
0.5674 0.2093 400 0.5521 0.7133 -0.0663 0.7246 0.7797 -265.0810 -267.7786 -2.4794 -2.5387
0.512 0.2616 500 0.5478 0.1922 -0.9270 0.7266 1.1192 -273.6879 -272.9901 -2.4185 -2.4842
0.5511 0.3140 600 0.5389 -0.0115 -1.1320 0.7539 1.1205 -275.7375 -275.0270 -2.3648 -2.4308
0.5851 0.3663 700 0.5448 0.0450 -1.1453 0.7402 1.1903 -275.8708 -274.4615 -2.4055 -2.4622
0.5302 0.4186 800 0.5569 -0.2258 -1.2912 0.7324 1.0653 -277.3294 -277.1702 -2.5104 -2.5742
0.518 0.4710 900 0.5607 -0.2557 -1.4332 0.75 1.1775 -278.7496 -277.4685 -2.4298 -2.4910
0.5525 0.5233 1000 0.5601 -0.7719 -1.9891 0.7480 1.2172 -284.3084 -282.6305 -2.4482 -2.5089
0.5189 0.5756 1100 0.5515 -0.4040 -1.5951 0.7422 1.1911 -280.3683 -278.9518 -2.4816 -2.5430
0.5331 0.6279 1200 0.5453 -0.5342 -1.7671 0.7383 1.2329 -282.0886 -280.2540 -2.4521 -2.5080
0.5104 0.6803 1300 0.5511 -0.4634 -1.8916 0.7363 1.4282 -283.3339 -279.5460 -2.4281 -2.4909
0.4976 0.7326 1400 0.5413 -0.3748 -1.7652 0.7363 1.3904 -282.0694 -278.6596 -2.4395 -2.4947
0.4814 0.7849 1500 0.5447 -0.8885 -2.1522 0.7305 1.2637 -285.9394 -283.7968 -2.4376 -2.4908
0.5075 0.8373 1600 0.5423 -0.3051 -1.5253 0.7344 1.2202 -279.6703 -277.9630 -2.4316 -2.4816
0.4906 0.8896 1700 0.5806 -1.4841 -3.0212 0.7266 1.5371 -294.6296 -289.7531 -2.4876 -2.5438
0.536 0.9419 1800 0.5603 -0.5951 -2.1710 0.7383 1.5759 -286.1272 -280.8625 -2.5694 -2.6123
0.5164 0.9942 1900 0.5567 -0.5404 -2.0173 0.7422 1.4769 -284.5909 -280.3160 -2.5490 -2.5898
0.0947 1.0466 2000 0.5942 -1.0618 -2.9986 0.7344 1.9369 -294.4039 -285.5296 -2.5622 -2.6140
0.068 1.0989 2100 0.6230 -1.6457 -3.9093 0.7520 2.2636 -303.5109 -291.3689 -2.4361 -2.5042
0.0747 1.1512 2200 0.6291 -1.3268 -3.4945 0.7461 2.1677 -299.3621 -288.1795 -2.3844 -2.4542
0.0553 1.2036 2300 0.6765 -2.2209 -4.6502 0.7344 2.4293 -310.9199 -297.1208 -2.4889 -2.5616
0.1207 1.2559 2400 0.6530 -1.7158 -3.9584 0.7246 2.2427 -304.0018 -292.0695 -2.4457 -2.5092
0.152 1.3082 2500 0.6882 -1.8791 -4.3806 0.7207 2.5015 -308.2237 -293.7032 -2.4232 -2.4917
0.1114 1.3605 2600 0.6422 -2.2334 -4.3890 0.7227 2.1556 -308.3074 -297.2458 -2.5713 -2.6189
0.1173 1.4129 2700 0.6619 -1.5700 -4.0282 0.7266 2.4581 -304.6991 -290.6119 -2.5152 -2.5719
0.0925 1.4652 2800 0.6523 -2.3231 -4.6279 0.7207 2.3048 -310.6963 -298.1424 -2.5141 -2.5711
0.1221 1.5175 2900 0.6496 -2.8770 -5.1437 0.7266 2.2667 -315.8546 -303.6823 -2.4733 -2.5414
0.0807 1.5699 3000 0.6925 -2.7762 -5.3350 0.7383 2.5588 -317.7678 -302.6737 -2.3267 -2.4141
0.105 1.6222 3100 0.6540 -2.6858 -5.0067 0.7246 2.3209 -314.4846 -301.7698 -2.3683 -2.4395
0.1162 1.6745 3200 0.6481 -1.8133 -4.0448 0.7148 2.2315 -304.8652 -293.0446 -2.3670 -2.4379
0.0667 1.7268 3300 0.6541 -2.0364 -4.3933 0.7363 2.3569 -308.3506 -295.2763 -2.2794 -2.3589
0.0935 1.7792 3400 0.6690 -2.7292 -5.2592 0.7441 2.5300 -317.0096 -302.2036 -2.2855 -2.3694
0.095 1.8315 3500 0.6361 -2.9308 -5.1591 0.7266 2.2284 -316.0090 -304.2198 -2.3827 -2.4530
0.0719 1.8838 3600 0.6778 -2.3616 -4.8272 0.7246 2.4656 -312.6893 -298.5278 -2.4285 -2.5018
0.0729 1.9362 3700 0.6754 -2.9280 -5.4360 0.7285 2.5080 -318.7774 -304.1916 -2.4287 -2.5049
0.0867 1.9885 3800 0.6744 -3.0956 -5.5458 0.7324 2.4502 -319.8756 -305.8675 -2.3542 -2.4301
0.0057 2.0408 3900 0.8833 -5.0083 -8.7774 0.7324 3.7690 -352.1913 -324.9953 -1.5131 -1.7155
0.0042 2.0931 4000 0.9722 -6.1264 -10.3554 0.7441 4.2290 -367.9712 -336.1759 -1.6158 -1.8694
0.0144 2.1455 4100 1.0865 -7.7872 -12.6090 0.7227 4.8218 -390.5074 -352.7837 -1.3817 -1.7022
0.0222 2.1978 4200 1.1130 -7.9969 -12.8510 0.7090 4.8541 -392.9280 -354.8811 -1.3909 -1.6967
0.0062 2.2501 4300 1.0722 -8.7884 -13.4773 0.7188 4.6889 -399.1902 -362.7955 -1.5072 -1.7459
0.0164 2.3025 4400 1.0993 -8.7821 -13.5683 0.7246 4.7862 -400.1005 -362.7325 -1.2294 -1.5182
0.0043 2.3548 4500 1.1250 -9.9027 -14.7785 0.7324 4.8758 -412.2026 -373.9385 -0.7476 -1.0957
0.0055 2.4071 4600 1.1975 -10.4385 -15.5644 0.7285 5.1258 -420.0612 -379.2971 -0.5940 -1.0020
0.0096 2.4594 4700 1.1443 -10.2507 -15.1793 0.7344 4.9286 -416.2106 -377.4187 -0.9036 -1.2413
0.0121 2.5118 4800 1.1422 -10.3821 -15.4221 0.7188 5.0400 -418.6388 -378.7332 -0.8425 -1.2175
0.0129 2.5641 4900 1.1155 -9.3510 -14.2451 0.7227 4.8941 -406.8687 -368.4216 -0.9190 -1.2930
0.0027 2.6164 5000 1.1905 -10.7239 -16.0360 0.7246 5.3121 -424.7772 -382.1504 -0.6076 -1.0264
0.0069 2.6688 5100 1.1635 -10.2624 -15.5178 0.7266 5.2555 -419.5960 -377.5356 -0.7336 -1.1315
0.009 2.7211 5200 1.1697 -10.4591 -15.6846 0.7266 5.2255 -421.2634 -379.5029 -0.5587 -0.9680
0.0088 2.7734 5300 1.1614 -9.6958 -14.8576 0.7246 5.1618 -412.9938 -371.8698 -0.7312 -1.1117
0.0078 2.8257 5400 1.1537 -10.1101 -15.2615 0.7168 5.1514 -417.0325 -376.0129 -0.6843 -1.0802
0.0209 2.8781 5500 1.1425 -10.8046 -15.9002 0.7266 5.0956 -423.4199 -382.9582 -0.5316 -0.9493
0.0145 2.9304 5600 1.1673 -10.6083 -15.8081 0.7266 5.1997 -422.4983 -380.9951 -0.5878 -1.0058
0.0189 2.9827 5700 1.1475 -10.8669 -16.0106 0.7285 5.1437 -424.5231 -383.5809 -0.5915 -1.0022

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.1.2
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Finetuned from

Dataset used to train happii/zephyr-7b-dpo-full