--- license: apache-2.0 base_model: alignment-handbook/zephyr-7b-sft-full tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: zephyr-7b-dpo-full results: [] --- # zephyr-7b-dpo-full This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 1.1487 - Rewards/chosen: -10.8742 - Rewards/rejected: -16.0045 - Rewards/accuracies: 0.7285 - Rewards/margins: 5.1303 - Logps/rejected: -424.4627 - Logps/chosen: -383.6538 - Logits/rejected: -0.5906 - Logits/chosen: -1.0023 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 16 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - total_train_batch_size: 32 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6673 | 0.0523 | 100 | 0.6670 | 0.0699 | 0.0097 | 0.6797 | 0.0602 | -264.3204 | -274.2128 | -2.5742 | -2.6289 | | 0.5806 | 0.1047 | 200 | 0.5926 | 0.3801 | -0.0108 | 0.7051 | 0.3909 | -264.5256 | -271.1104 | -2.5225 | -2.5806 | | 0.554 | 0.1570 | 300 | 0.5669 | 0.3096 | -0.4486 | 0.7246 | 0.7581 | -268.9032 | -271.8162 | -2.4975 | -2.5603 | | 0.5674 | 0.2093 | 400 | 0.5521 | 0.7133 | -0.0663 | 0.7246 | 0.7797 | -265.0810 | -267.7786 | -2.4794 | -2.5387 | | 0.512 | 0.2616 | 500 | 0.5478 | 0.1922 | -0.9270 | 0.7266 | 1.1192 | -273.6879 | -272.9901 | -2.4185 | -2.4842 | | 0.5511 | 0.3140 | 600 | 0.5389 | -0.0115 | -1.1320 | 0.7539 | 1.1205 | -275.7375 | -275.0270 | -2.3648 | -2.4308 | | 0.5851 | 0.3663 | 700 | 0.5448 | 0.0450 | -1.1453 | 0.7402 | 1.1903 | -275.8708 | -274.4615 | -2.4055 | -2.4622 | | 0.5302 | 0.4186 | 800 | 0.5569 | -0.2258 | -1.2912 | 0.7324 | 1.0653 | -277.3294 | -277.1702 | -2.5104 | -2.5742 | | 0.518 | 0.4710 | 900 | 0.5607 | -0.2557 | -1.4332 | 0.75 | 1.1775 | -278.7496 | -277.4685 | -2.4298 | -2.4910 | | 0.5525 | 0.5233 | 1000 | 0.5601 | -0.7719 | -1.9891 | 0.7480 | 1.2172 | -284.3084 | -282.6305 | -2.4482 | -2.5089 | | 0.5189 | 0.5756 | 1100 | 0.5515 | -0.4040 | -1.5951 | 0.7422 | 1.1911 | -280.3683 | -278.9518 | -2.4816 | -2.5430 | | 0.5331 | 0.6279 | 1200 | 0.5453 | -0.5342 | -1.7671 | 0.7383 | 1.2329 | -282.0886 | -280.2540 | -2.4521 | -2.5080 | | 0.5104 | 0.6803 | 1300 | 0.5511 | -0.4634 | -1.8916 | 0.7363 | 1.4282 | -283.3339 | -279.5460 | -2.4281 | -2.4909 | | 0.4976 | 0.7326 | 1400 | 0.5413 | -0.3748 | -1.7652 | 0.7363 | 1.3904 | -282.0694 | -278.6596 | -2.4395 | -2.4947 | | 0.4814 | 0.7849 | 1500 | 0.5447 | -0.8885 | -2.1522 | 0.7305 | 1.2637 | -285.9394 | -283.7968 | -2.4376 | -2.4908 | | 0.5075 | 0.8373 | 1600 | 0.5423 | -0.3051 | -1.5253 | 0.7344 | 1.2202 | -279.6703 | -277.9630 | -2.4316 | -2.4816 | | 0.4906 | 0.8896 | 1700 | 0.5806 | -1.4841 | -3.0212 | 0.7266 | 1.5371 | -294.6296 | -289.7531 | -2.4876 | -2.5438 | | 0.536 | 0.9419 | 1800 | 0.5603 | -0.5951 | -2.1710 | 0.7383 | 1.5759 | -286.1272 | -280.8625 | -2.5694 | -2.6123 | | 0.5164 | 0.9942 | 1900 | 0.5567 | -0.5404 | -2.0173 | 0.7422 | 1.4769 | -284.5909 | -280.3160 | -2.5490 | -2.5898 | | 0.0947 | 1.0466 | 2000 | 0.5942 | -1.0618 | -2.9986 | 0.7344 | 1.9369 | -294.4039 | -285.5296 | -2.5622 | -2.6140 | | 0.068 | 1.0989 | 2100 | 0.6230 | -1.6457 | -3.9093 | 0.7520 | 2.2636 | -303.5109 | -291.3689 | -2.4361 | -2.5042 | | 0.0747 | 1.1512 | 2200 | 0.6291 | -1.3268 | -3.4945 | 0.7461 | 2.1677 | -299.3621 | -288.1795 | -2.3844 | -2.4542 | | 0.0553 | 1.2036 | 2300 | 0.6765 | -2.2209 | -4.6502 | 0.7344 | 2.4293 | -310.9199 | -297.1208 | -2.4889 | -2.5616 | | 0.1207 | 1.2559 | 2400 | 0.6530 | -1.7158 | -3.9584 | 0.7246 | 2.2427 | -304.0018 | -292.0695 | -2.4457 | -2.5092 | | 0.152 | 1.3082 | 2500 | 0.6882 | -1.8791 | -4.3806 | 0.7207 | 2.5015 | -308.2237 | -293.7032 | -2.4232 | -2.4917 | | 0.1114 | 1.3605 | 2600 | 0.6422 | -2.2334 | -4.3890 | 0.7227 | 2.1556 | -308.3074 | -297.2458 | -2.5713 | -2.6189 | | 0.1173 | 1.4129 | 2700 | 0.6619 | -1.5700 | -4.0282 | 0.7266 | 2.4581 | -304.6991 | -290.6119 | -2.5152 | -2.5719 | | 0.0925 | 1.4652 | 2800 | 0.6523 | -2.3231 | -4.6279 | 0.7207 | 2.3048 | -310.6963 | -298.1424 | -2.5141 | -2.5711 | | 0.1221 | 1.5175 | 2900 | 0.6496 | -2.8770 | -5.1437 | 0.7266 | 2.2667 | -315.8546 | -303.6823 | -2.4733 | -2.5414 | | 0.0807 | 1.5699 | 3000 | 0.6925 | -2.7762 | -5.3350 | 0.7383 | 2.5588 | -317.7678 | -302.6737 | -2.3267 | -2.4141 | | 0.105 | 1.6222 | 3100 | 0.6540 | -2.6858 | -5.0067 | 0.7246 | 2.3209 | -314.4846 | -301.7698 | -2.3683 | -2.4395 | | 0.1162 | 1.6745 | 3200 | 0.6481 | -1.8133 | -4.0448 | 0.7148 | 2.2315 | -304.8652 | -293.0446 | -2.3670 | -2.4379 | | 0.0667 | 1.7268 | 3300 | 0.6541 | -2.0364 | -4.3933 | 0.7363 | 2.3569 | -308.3506 | -295.2763 | -2.2794 | -2.3589 | | 0.0935 | 1.7792 | 3400 | 0.6690 | -2.7292 | -5.2592 | 0.7441 | 2.5300 | -317.0096 | -302.2036 | -2.2855 | -2.3694 | | 0.095 | 1.8315 | 3500 | 0.6361 | -2.9308 | -5.1591 | 0.7266 | 2.2284 | -316.0090 | -304.2198 | -2.3827 | -2.4530 | | 0.0719 | 1.8838 | 3600 | 0.6778 | -2.3616 | -4.8272 | 0.7246 | 2.4656 | -312.6893 | -298.5278 | -2.4285 | -2.5018 | | 0.0729 | 1.9362 | 3700 | 0.6754 | -2.9280 | -5.4360 | 0.7285 | 2.5080 | -318.7774 | -304.1916 | -2.4287 | -2.5049 | | 0.0867 | 1.9885 | 3800 | 0.6744 | -3.0956 | -5.5458 | 0.7324 | 2.4502 | -319.8756 | -305.8675 | -2.3542 | -2.4301 | | 0.0057 | 2.0408 | 3900 | 0.8833 | -5.0083 | -8.7774 | 0.7324 | 3.7690 | -352.1913 | -324.9953 | -1.5131 | -1.7155 | | 0.0042 | 2.0931 | 4000 | 0.9722 | -6.1264 | -10.3554 | 0.7441 | 4.2290 | -367.9712 | -336.1759 | -1.6158 | -1.8694 | | 0.0144 | 2.1455 | 4100 | 1.0865 | -7.7872 | -12.6090 | 0.7227 | 4.8218 | -390.5074 | -352.7837 | -1.3817 | -1.7022 | | 0.0222 | 2.1978 | 4200 | 1.1130 | -7.9969 | -12.8510 | 0.7090 | 4.8541 | -392.9280 | -354.8811 | -1.3909 | -1.6967 | | 0.0062 | 2.2501 | 4300 | 1.0722 | -8.7884 | -13.4773 | 0.7188 | 4.6889 | -399.1902 | -362.7955 | -1.5072 | -1.7459 | | 0.0164 | 2.3025 | 4400 | 1.0993 | -8.7821 | -13.5683 | 0.7246 | 4.7862 | -400.1005 | -362.7325 | -1.2294 | -1.5182 | | 0.0043 | 2.3548 | 4500 | 1.1250 | -9.9027 | -14.7785 | 0.7324 | 4.8758 | -412.2026 | -373.9385 | -0.7476 | -1.0957 | | 0.0055 | 2.4071 | 4600 | 1.1975 | -10.4385 | -15.5644 | 0.7285 | 5.1258 | -420.0612 | -379.2971 | -0.5940 | -1.0020 | | 0.0096 | 2.4594 | 4700 | 1.1443 | -10.2507 | -15.1793 | 0.7344 | 4.9286 | -416.2106 | -377.4187 | -0.9036 | -1.2413 | | 0.0121 | 2.5118 | 4800 | 1.1422 | -10.3821 | -15.4221 | 0.7188 | 5.0400 | -418.6388 | -378.7332 | -0.8425 | -1.2175 | | 0.0129 | 2.5641 | 4900 | 1.1155 | -9.3510 | -14.2451 | 0.7227 | 4.8941 | -406.8687 | -368.4216 | -0.9190 | -1.2930 | | 0.0027 | 2.6164 | 5000 | 1.1905 | -10.7239 | -16.0360 | 0.7246 | 5.3121 | -424.7772 | -382.1504 | -0.6076 | -1.0264 | | 0.0069 | 2.6688 | 5100 | 1.1635 | -10.2624 | -15.5178 | 0.7266 | 5.2555 | -419.5960 | -377.5356 | -0.7336 | -1.1315 | | 0.009 | 2.7211 | 5200 | 1.1697 | -10.4591 | -15.6846 | 0.7266 | 5.2255 | -421.2634 | -379.5029 | -0.5587 | -0.9680 | | 0.0088 | 2.7734 | 5300 | 1.1614 | -9.6958 | -14.8576 | 0.7246 | 5.1618 | -412.9938 | -371.8698 | -0.7312 | -1.1117 | | 0.0078 | 2.8257 | 5400 | 1.1537 | -10.1101 | -15.2615 | 0.7168 | 5.1514 | -417.0325 | -376.0129 | -0.6843 | -1.0802 | | 0.0209 | 2.8781 | 5500 | 1.1425 | -10.8046 | -15.9002 | 0.7266 | 5.0956 | -423.4199 | -382.9582 | -0.5316 | -0.9493 | | 0.0145 | 2.9304 | 5600 | 1.1673 | -10.6083 | -15.8081 | 0.7266 | 5.1997 | -422.4983 | -380.9951 | -0.5878 | -1.0058 | | 0.0189 | 2.9827 | 5700 | 1.1475 | -10.8669 | -16.0106 | 0.7285 | 5.1437 | -424.5231 | -383.5809 | -0.5915 | -1.0022 | ### Framework versions - Transformers 4.40.2 - Pytorch 2.1.2 - Datasets 2.19.1 - Tokenizers 0.19.1