Edit model card

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1487
  • Rewards/chosen: -10.8742
  • Rewards/rejected: -16.0045
  • Rewards/accuracies: 0.7285
  • Rewards/margins: 5.1303
  • Logps/rejected: -424.4627
  • Logps/chosen: -383.6538
  • Logits/rejected: -0.5906
  • Logits/chosen: -1.0023

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6673 0.0523 100 0.6670 0.0699 0.0097 0.6797 0.0602 -264.3204 -274.2128 -2.5742 -2.6289
0.5806 0.1047 200 0.5926 0.3801 -0.0108 0.7051 0.3909 -264.5256 -271.1104 -2.5225 -2.5806
0.554 0.1570 300 0.5669 0.3096 -0.4486 0.7246 0.7581 -268.9032 -271.8162 -2.4975 -2.5603
0.5674 0.2093 400 0.5521 0.7133 -0.0663 0.7246 0.7797 -265.0810 -267.7786 -2.4794 -2.5387
0.512 0.2616 500 0.5478 0.1922 -0.9270 0.7266 1.1192 -273.6879 -272.9901 -2.4185 -2.4842
0.5511 0.3140 600 0.5389 -0.0115 -1.1320 0.7539 1.1205 -275.7375 -275.0270 -2.3648 -2.4308
0.5851 0.3663 700 0.5448 0.0450 -1.1453 0.7402 1.1903 -275.8708 -274.4615 -2.4055 -2.4622
0.5302 0.4186 800 0.5569 -0.2258 -1.2912 0.7324 1.0653 -277.3294 -277.1702 -2.5104 -2.5742
0.518 0.4710 900 0.5607 -0.2557 -1.4332 0.75 1.1775 -278.7496 -277.4685 -2.4298 -2.4910
0.5525 0.5233 1000 0.5601 -0.7719 -1.9891 0.7480 1.2172 -284.3084 -282.6305 -2.4482 -2.5089
0.5189 0.5756 1100 0.5515 -0.4040 -1.5951 0.7422 1.1911 -280.3683 -278.9518 -2.4816 -2.5430
0.5331 0.6279 1200 0.5453 -0.5342 -1.7671 0.7383 1.2329 -282.0886 -280.2540 -2.4521 -2.5080
0.5104 0.6803 1300 0.5511 -0.4634 -1.8916 0.7363 1.4282 -283.3339 -279.5460 -2.4281 -2.4909
0.4976 0.7326 1400 0.5413 -0.3748 -1.7652 0.7363 1.3904 -282.0694 -278.6596 -2.4395 -2.4947
0.4814 0.7849 1500 0.5447 -0.8885 -2.1522 0.7305 1.2637 -285.9394 -283.7968 -2.4376 -2.4908
0.5075 0.8373 1600 0.5423 -0.3051 -1.5253 0.7344 1.2202 -279.6703 -277.9630 -2.4316 -2.4816
0.4906 0.8896 1700 0.5806 -1.4841 -3.0212 0.7266 1.5371 -294.6296 -289.7531 -2.4876 -2.5438
0.536 0.9419 1800 0.5603 -0.5951 -2.1710 0.7383 1.5759 -286.1272 -280.8625 -2.5694 -2.6123
0.5164 0.9942 1900 0.5567 -0.5404 -2.0173 0.7422 1.4769 -284.5909 -280.3160 -2.5490 -2.5898
0.0947 1.0466 2000 0.5942 -1.0618 -2.9986 0.7344 1.9369 -294.4039 -285.5296 -2.5622 -2.6140
0.068 1.0989 2100 0.6230 -1.6457 -3.9093 0.7520 2.2636 -303.5109 -291.3689 -2.4361 -2.5042
0.0747 1.1512 2200 0.6291 -1.3268 -3.4945 0.7461 2.1677 -299.3621 -288.1795 -2.3844 -2.4542
0.0553 1.2036 2300 0.6765 -2.2209 -4.6502 0.7344 2.4293 -310.9199 -297.1208 -2.4889 -2.5616
0.1207 1.2559 2400 0.6530 -1.7158 -3.9584 0.7246 2.2427 -304.0018 -292.0695 -2.4457 -2.5092
0.152 1.3082 2500 0.6882 -1.8791 -4.3806 0.7207 2.5015 -308.2237 -293.7032 -2.4232 -2.4917
0.1114 1.3605 2600 0.6422 -2.2334 -4.3890 0.7227 2.1556 -308.3074 -297.2458 -2.5713 -2.6189
0.1173 1.4129 2700 0.6619 -1.5700 -4.0282 0.7266 2.4581 -304.6991 -290.6119 -2.5152 -2.5719
0.0925 1.4652 2800 0.6523 -2.3231 -4.6279 0.7207 2.3048 -310.6963 -298.1424 -2.5141 -2.5711
0.1221 1.5175 2900 0.6496 -2.8770 -5.1437 0.7266 2.2667 -315.8546 -303.6823 -2.4733 -2.5414
0.0807 1.5699 3000 0.6925 -2.7762 -5.3350 0.7383 2.5588 -317.7678 -302.6737 -2.3267 -2.4141
0.105 1.6222 3100 0.6540 -2.6858 -5.0067 0.7246 2.3209 -314.4846 -301.7698 -2.3683 -2.4395
0.1162 1.6745 3200 0.6481 -1.8133 -4.0448 0.7148 2.2315 -304.8652 -293.0446 -2.3670 -2.4379
0.0667 1.7268 3300 0.6541 -2.0364 -4.3933 0.7363 2.3569 -308.3506 -295.2763 -2.2794 -2.3589
0.0935 1.7792 3400 0.6690 -2.7292 -5.2592 0.7441 2.5300 -317.0096 -302.2036 -2.2855 -2.3694
0.095 1.8315 3500 0.6361 -2.9308 -5.1591 0.7266 2.2284 -316.0090 -304.2198 -2.3827 -2.4530
0.0719 1.8838 3600 0.6778 -2.3616 -4.8272 0.7246 2.4656 -312.6893 -298.5278 -2.4285 -2.5018
0.0729 1.9362 3700 0.6754 -2.9280 -5.4360 0.7285 2.5080 -318.7774 -304.1916 -2.4287 -2.5049
0.0867 1.9885 3800 0.6744 -3.0956 -5.5458 0.7324 2.4502 -319.8756 -305.8675 -2.3542 -2.4301
0.0057 2.0408 3900 0.8833 -5.0083 -8.7774 0.7324 3.7690 -352.1913 -324.9953 -1.5131 -1.7155
0.0042 2.0931 4000 0.9722 -6.1264 -10.3554 0.7441 4.2290 -367.9712 -336.1759 -1.6158 -1.8694
0.0144 2.1455 4100 1.0865 -7.7872 -12.6090 0.7227 4.8218 -390.5074 -352.7837 -1.3817 -1.7022
0.0222 2.1978 4200 1.1130 -7.9969 -12.8510 0.7090 4.8541 -392.9280 -354.8811 -1.3909 -1.6967
0.0062 2.2501 4300 1.0722 -8.7884 -13.4773 0.7188 4.6889 -399.1902 -362.7955 -1.5072 -1.7459
0.0164 2.3025 4400 1.0993 -8.7821 -13.5683 0.7246 4.7862 -400.1005 -362.7325 -1.2294 -1.5182
0.0043 2.3548 4500 1.1250 -9.9027 -14.7785 0.7324 4.8758 -412.2026 -373.9385 -0.7476 -1.0957
0.0055 2.4071 4600 1.1975 -10.4385 -15.5644 0.7285 5.1258 -420.0612 -379.2971 -0.5940 -1.0020
0.0096 2.4594 4700 1.1443 -10.2507 -15.1793 0.7344 4.9286 -416.2106 -377.4187 -0.9036 -1.2413
0.0121 2.5118 4800 1.1422 -10.3821 -15.4221 0.7188 5.0400 -418.6388 -378.7332 -0.8425 -1.2175
0.0129 2.5641 4900 1.1155 -9.3510 -14.2451 0.7227 4.8941 -406.8687 -368.4216 -0.9190 -1.2930
0.0027 2.6164 5000 1.1905 -10.7239 -16.0360 0.7246 5.3121 -424.7772 -382.1504 -0.6076 -1.0264
0.0069 2.6688 5100 1.1635 -10.2624 -15.5178 0.7266 5.2555 -419.5960 -377.5356 -0.7336 -1.1315
0.009 2.7211 5200 1.1697 -10.4591 -15.6846 0.7266 5.2255 -421.2634 -379.5029 -0.5587 -0.9680
0.0088 2.7734 5300 1.1614 -9.6958 -14.8576 0.7246 5.1618 -412.9938 -371.8698 -0.7312 -1.1117
0.0078 2.8257 5400 1.1537 -10.1101 -15.2615 0.7168 5.1514 -417.0325 -376.0129 -0.6843 -1.0802
0.0209 2.8781 5500 1.1425 -10.8046 -15.9002 0.7266 5.0956 -423.4199 -382.9582 -0.5316 -0.9493
0.0145 2.9304 5600 1.1673 -10.6083 -15.8081 0.7266 5.1997 -422.4983 -380.9951 -0.5878 -1.0058
0.0189 2.9827 5700 1.1475 -10.8669 -16.0106 0.7285 5.1437 -424.5231 -383.5809 -0.5915 -1.0022

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.1.2
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for happii/zephyr-7b-dpo-full

Finetuned
this model

Dataset used to train happii/zephyr-7b-dpo-full