--- license: apache-2.0 base_model: alignment-handbook/zephyr-7b-sft-full tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - HuggingFaceH4/ultrafeedback_binarized - AmberYifan/safetyQA_DPO model-index: - name: zephyr-7b-sft-safeDPO3 results: [] --- # zephyr-7b-sft-safeDPO3 This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized and the AmberYifan/safetyQA_DPO datasets. It achieves the following results on the evaluation set: - Loss: 0.6446 - Rewards/chosen: -8.0278 - Rewards/rejected: -9.5352 - Rewards/accuracies: 0.7152 - Rewards/margins: 1.5074 - Logps/rejected: -1123.8456 - Logps/chosen: -965.5345 - Logits/rejected: 3.5622 - Logits/chosen: 4.0391 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6915 | 0.06 | 100 | 0.6917 | -0.0059 | -0.0087 | 0.5919 | 0.0028 | -171.1956 | -163.3472 | -2.5867 | -2.5707 | | 0.6667 | 0.12 | 200 | 0.6690 | -0.2049 | -0.2607 | 0.6307 | 0.0558 | -196.4011 | -183.2503 | -2.5361 | -2.5294 | | 0.6064 | 0.17 | 300 | 0.6131 | -1.0874 | -1.4208 | 0.6530 | 0.3333 | -312.4040 | -271.4992 | -2.3765 | -2.3824 | | 0.5768 | 0.23 | 400 | 0.5798 | -2.0019 | -2.5132 | 0.7118 | 0.5113 | -421.6484 | -362.9495 | -2.2241 | -2.2088 | | 0.5653 | 0.29 | 500 | 0.5732 | -2.2365 | -2.8068 | 0.7038 | 0.5703 | -451.0063 | -386.4047 | -1.8327 | -1.8721 | | 0.5717 | 0.35 | 600 | 0.5686 | -2.0292 | -2.5806 | 0.7175 | 0.5514 | -428.3890 | -365.6780 | -1.8751 | -1.9234 | | 0.5752 | 0.4 | 700 | 0.5646 | -2.0035 | -2.5598 | 0.7152 | 0.5563 | -426.3091 | -363.1083 | -1.7231 | -1.7178 | | 0.5592 | 0.46 | 800 | 0.5595 | -2.1767 | -2.7903 | 0.7152 | 0.6135 | -449.3554 | -380.4316 | -0.4741 | -0.4635 | | 0.5477 | 0.52 | 900 | 0.5613 | -2.1853 | -2.7708 | 0.7243 | 0.5854 | -447.4023 | -381.2917 | -1.8590 | -1.9478 | | 0.5136 | 0.58 | 1000 | 0.5533 | -2.1797 | -2.8703 | 0.7226 | 0.6906 | -457.3545 | -380.7242 | -1.6491 | -1.7174 | | 0.5555 | 0.63 | 1100 | 0.5573 | -1.6655 | -2.2517 | 0.7158 | 0.5862 | -395.4941 | -329.3049 | -1.5555 | -1.5565 | | 0.5044 | 0.69 | 1200 | 0.5457 | -2.5919 | -3.3662 | 0.7203 | 0.7743 | -506.9478 | -421.9445 | 0.4933 | 0.5009 | | 0.5078 | 0.75 | 1300 | 0.5505 | -2.3710 | -3.0599 | 0.7220 | 0.6889 | -476.3146 | -399.8520 | 0.4823 | 0.6094 | | 0.5333 | 0.81 | 1400 | 0.5486 | -2.3628 | -3.0508 | 0.7175 | 0.6880 | -475.4082 | -399.0350 | 0.5794 | 0.6967 | | 0.4799 | 0.86 | 1500 | 0.5452 | -2.7663 | -3.5674 | 0.7380 | 0.8011 | -527.0656 | -439.3846 | 1.2406 | 1.3814 | | 0.5551 | 0.92 | 1600 | 0.5455 | -2.6894 | -3.4539 | 0.7329 | 0.7645 | -515.7155 | -431.6923 | 0.7892 | 0.8498 | | 0.4911 | 0.98 | 1700 | 0.5509 | -3.3307 | -4.1684 | 0.7300 | 0.8376 | -587.1636 | -495.8297 | 2.3144 | 2.2622 | | 0.3058 | 1.04 | 1800 | 0.5704 | -4.5768 | -5.6386 | 0.7215 | 1.0618 | -734.1904 | -620.4401 | 2.5171 | 2.4413 | | 0.3346 | 1.09 | 1900 | 0.5765 | -4.5531 | -5.5699 | 0.7152 | 1.0168 | -727.3204 | -618.0657 | 2.0386 | 1.9196 | | 0.3186 | 1.15 | 2000 | 0.5844 | -5.1617 | -6.2422 | 0.7140 | 1.0806 | -794.5490 | -678.9232 | 1.8747 | 1.7608 | | 0.3032 | 1.21 | 2100 | 0.5746 | -4.5098 | -5.5583 | 0.7255 | 1.0485 | -726.1542 | -613.7318 | 1.8097 | 1.9375 | | 0.3192 | 1.27 | 2200 | 0.5697 | -4.5579 | -5.6208 | 0.7215 | 1.0629 | -732.4099 | -618.5480 | 1.4935 | 1.6381 | | 0.3047 | 1.32 | 2300 | 0.5830 | -5.3394 | -6.5272 | 0.7266 | 1.1877 | -823.0447 | -696.7006 | 1.9596 | 2.0880 | | 0.3109 | 1.38 | 2400 | 0.5797 | -4.8875 | -6.0347 | 0.7192 | 1.1472 | -773.7961 | -651.5051 | 2.0438 | 2.2156 | | 0.3165 | 1.44 | 2500 | 0.5704 | -4.8449 | -5.9117 | 0.7283 | 1.0668 | -761.4922 | -647.2463 | 1.6852 | 1.9232 | | 0.321 | 1.5 | 2600 | 0.5705 | -4.4244 | -5.3853 | 0.7197 | 0.9609 | -708.8524 | -605.1918 | 1.8092 | 2.0444 | | 0.3164 | 1.55 | 2700 | 0.5779 | -5.0938 | -6.1851 | 0.7169 | 1.0913 | -788.8352 | -672.1396 | 2.3926 | 2.6931 | | 0.3201 | 1.61 | 2800 | 0.5634 | -4.3216 | -5.3414 | 0.7249 | 1.0197 | -704.4624 | -594.9215 | 1.9326 | 2.1325 | | 0.3367 | 1.67 | 2900 | 0.5631 | -4.6112 | -5.6238 | 0.7255 | 1.0126 | -732.7039 | -623.8734 | 1.4794 | 1.6802 | | 0.3414 | 1.73 | 3000 | 0.5698 | -4.6100 | -5.6200 | 0.7289 | 1.0100 | -732.3315 | -623.7572 | 1.6920 | 1.9589 | | 0.3097 | 1.79 | 3100 | 0.5739 | -4.9875 | -6.1217 | 0.7255 | 1.1342 | -782.4933 | -661.5057 | 2.0260 | 2.2980 | | 0.3077 | 1.84 | 3200 | 0.5685 | -5.0298 | -6.1319 | 0.7226 | 1.1021 | -783.5215 | -665.7410 | 2.0798 | 2.3995 | | 0.3101 | 1.9 | 3300 | 0.5709 | -5.0035 | -6.1378 | 0.7352 | 1.1343 | -784.1074 | -663.1116 | 1.9782 | 2.2950 | | 0.3235 | 1.96 | 3400 | 0.5629 | -4.8491 | -5.8527 | 0.7346 | 1.0035 | -755.5942 | -647.6710 | 1.9155 | 2.2626 | | 0.1328 | 2.02 | 3500 | 0.6063 | -6.6142 | -7.9563 | 0.7289 | 1.3421 | -965.9568 | -824.1730 | 2.7098 | 3.0637 | | 0.1438 | 2.07 | 3600 | 0.6421 | -7.9002 | -9.3674 | 0.7158 | 1.4671 | -1107.0624 | -952.7795 | 3.3994 | 3.8343 | | 0.1474 | 2.13 | 3700 | 0.6611 | -7.9802 | -9.5452 | 0.7083 | 1.5651 | -1124.8511 | -960.7725 | 3.4598 | 3.9152 | | 0.1267 | 2.19 | 3800 | 0.6578 | -8.3961 | -9.8684 | 0.7072 | 1.4723 | -1157.1674 | -1002.3674 | 3.7728 | 4.2505 | | 0.117 | 2.25 | 3900 | 0.6595 | -8.8743 | -10.4271 | 0.7072 | 1.5528 | -1213.0370 | -1050.1907 | 3.8392 | 4.3287 | | 0.1347 | 2.3 | 4000 | 0.6543 | -8.3484 | -9.8783 | 0.7049 | 1.5300 | -1158.1610 | -997.5932 | 3.6606 | 4.1056 | | 0.1329 | 2.36 | 4100 | 0.6601 | -8.2633 | -9.8163 | 0.7158 | 1.5530 | -1151.9531 | -989.0843 | 3.4748 | 3.9028 | | 0.1272 | 2.42 | 4200 | 0.6521 | -8.3826 | -9.9282 | 0.7129 | 1.5456 | -1163.1472 | -1001.0134 | 3.5794 | 4.0564 | | 0.1398 | 2.48 | 4300 | 0.6440 | -8.1928 | -9.6983 | 0.7146 | 1.5054 | -1140.1526 | -982.0401 | 3.5277 | 4.0106 | | 0.1452 | 2.53 | 4400 | 0.6379 | -7.7709 | -9.2597 | 0.7140 | 1.4888 | -1096.2968 | -939.8471 | 3.3970 | 3.8629 | | 0.1686 | 2.59 | 4500 | 0.6465 | -8.0350 | -9.5456 | 0.7152 | 1.5106 | -1124.8850 | -966.2559 | 3.5100 | 3.9841 | | 0.1626 | 2.65 | 4600 | 0.6461 | -8.0584 | -9.5877 | 0.7152 | 1.5293 | -1129.0981 | -968.5971 | 3.5312 | 4.0077 | | 0.1496 | 2.71 | 4700 | 0.6474 | -7.9977 | -9.5321 | 0.7163 | 1.5344 | -1123.5376 | -962.5296 | 3.5337 | 4.0036 | | 0.1418 | 2.76 | 4800 | 0.6431 | -7.9795 | -9.4898 | 0.7146 | 1.5103 | -1119.3051 | -960.7057 | 3.5538 | 4.0293 | | 0.1505 | 2.82 | 4900 | 0.6432 | -8.0170 | -9.5172 | 0.7158 | 1.5002 | -1122.0504 | -964.4604 | 3.5728 | 4.0513 | | 0.1321 | 2.88 | 5000 | 0.6443 | -8.0235 | -9.5310 | 0.7123 | 1.5075 | -1123.4263 | -965.1030 | 3.5611 | 4.0373 | | 0.1269 | 2.94 | 5100 | 0.6447 | -8.0373 | -9.5449 | 0.7140 | 1.5076 | -1124.8213 | -966.4896 | 3.5691 | 4.0472 | | 0.1417 | 2.99 | 5200 | 0.6446 | -8.0277 | -9.5354 | 0.7163 | 1.5078 | -1123.8704 | -965.5221 | 3.5627 | 4.0395 | ### Framework versions - Transformers 4.39.0.dev0 - Pytorch 2.3.0+cu121 - Datasets 2.14.6 - Tokenizers 0.15.2