--- library_name: transformers license: apache-2.0 base_model: zera09/long_t5_4 tags: - trl - dpo - generated_from_trainer model-index: - name: dpo_t5_3 results: [] --- # dpo_t5_3 This model is a fine-tuned version of [zera09/long_t5_4](https://huggingface.co/zera09/long_t5_4) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.3279 - Rewards/chosen: 0.4243 - Rewards/rejected: -1.0110 - Rewards/accuracies: 0.8625 - Rewards/margins: 1.4353 - Logps/rejected: -7.0019 - Logps/chosen: -25.3685 - Logits/rejected: -18.2655 - Logits/chosen: -17.9202 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 32 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.8579 | 0.05 | 10 | 0.6748 | 0.0198 | -0.0177 | 0.8469 | 0.0375 | -5.3464 | -26.0427 | -18.9879 | -18.5345 | | 0.7446 | 0.1 | 20 | 0.6573 | 0.0382 | -0.0362 | 0.8500 | 0.0745 | -5.3773 | -26.0120 | -18.9693 | -18.5188 | | 0.7655 | 0.15 | 30 | 0.6406 | 0.0560 | -0.0553 | 0.8506 | 0.1113 | -5.4091 | -25.9824 | -18.9500 | -18.5025 | | 0.7752 | 0.2 | 40 | 0.6255 | 0.0722 | -0.0736 | 0.8512 | 0.1458 | -5.4396 | -25.9553 | -18.9314 | -18.4868 | | 0.7797 | 0.25 | 50 | 0.6113 | 0.0878 | -0.0916 | 0.8506 | 0.1794 | -5.4696 | -25.9294 | -18.9138 | -18.4718 | | 0.7109 | 0.3 | 60 | 0.5975 | 0.1034 | -0.1102 | 0.8519 | 0.2136 | -5.5006 | -25.9034 | -18.8960 | -18.4569 | | 0.6863 | 0.35 | 70 | 0.5845 | 0.1186 | -0.1282 | 0.8512 | 0.2468 | -5.5307 | -25.8781 | -18.8798 | -18.4433 | | 0.6736 | 0.4 | 80 | 0.5720 | 0.1328 | -0.1470 | 0.8519 | 0.2798 | -5.5619 | -25.8544 | -18.8631 | -18.4291 | | 0.6564 | 0.45 | 90 | 0.5601 | 0.1467 | -0.1658 | 0.8519 | 0.3125 | -5.5933 | -25.8312 | -18.8455 | -18.4141 | | 0.705 | 0.5 | 100 | 0.5490 | 0.1599 | -0.1843 | 0.8519 | 0.3442 | -5.6241 | -25.8092 | -18.8285 | -18.3996 | | 0.6871 | 0.55 | 110 | 0.5380 | 0.1726 | -0.2041 | 0.8531 | 0.3768 | -5.6572 | -25.7880 | -18.8116 | -18.3852 | | 0.7134 | 0.6 | 120 | 0.5275 | 0.1850 | -0.2241 | 0.8550 | 0.4091 | -5.6905 | -25.7674 | -18.7943 | -18.3706 | | 0.6389 | 0.65 | 130 | 0.5179 | 0.1966 | -0.2433 | 0.8562 | 0.4399 | -5.7224 | -25.7480 | -18.7777 | -18.3565 | | 0.6128 | 0.7 | 140 | 0.5087 | 0.2081 | -0.2619 | 0.8569 | 0.4699 | -5.7534 | -25.7289 | -18.7619 | -18.3430 | | 0.6281 | 0.75 | 150 | 0.4996 | 0.2197 | -0.2817 | 0.8569 | 0.5014 | -5.7865 | -25.7096 | -18.7467 | -18.3302 | | 0.6216 | 0.8 | 160 | 0.4910 | 0.2300 | -0.3019 | 0.8569 | 0.5319 | -5.8201 | -25.6924 | -18.7314 | -18.3175 | | 0.6002 | 0.85 | 170 | 0.4828 | 0.2403 | -0.3216 | 0.8575 | 0.5619 | -5.8529 | -25.6752 | -18.7166 | -18.3049 | | 0.5649 | 0.9 | 180 | 0.4752 | 0.2501 | -0.3406 | 0.8575 | 0.5908 | -5.8847 | -25.6588 | -18.7023 | -18.2929 | | 0.5695 | 0.95 | 190 | 0.4680 | 0.2594 | -0.3595 | 0.8575 | 0.6189 | -5.9162 | -25.6434 | -18.6874 | -18.2802 | | 0.5675 | 1.0 | 200 | 0.4611 | 0.2678 | -0.3786 | 0.8575 | 0.6464 | -5.9479 | -25.6293 | -18.6724 | -18.2674 | | 0.5146 | 1.05 | 210 | 0.4544 | 0.2762 | -0.3980 | 0.8575 | 0.6742 | -5.9802 | -25.6154 | -18.6581 | -18.2553 | | 0.558 | 1.1 | 220 | 0.4482 | 0.2841 | -0.4168 | 0.8581 | 0.7009 | -6.0116 | -25.6021 | -18.6442 | -18.2437 | | 0.598 | 1.15 | 230 | 0.4420 | 0.2923 | -0.4358 | 0.8581 | 0.7282 | -6.0433 | -25.5885 | -18.6301 | -18.2318 | | 0.5918 | 1.2 | 240 | 0.4360 | 0.3001 | -0.4552 | 0.8581 | 0.7553 | -6.0755 | -25.5755 | -18.6160 | -18.2198 | | 0.5576 | 1.25 | 250 | 0.4307 | 0.3064 | -0.4737 | 0.8587 | 0.7802 | -6.1065 | -25.5650 | -18.6033 | -18.2090 | | 0.5702 | 1.3 | 260 | 0.4257 | 0.3125 | -0.4916 | 0.8587 | 0.8041 | -6.1363 | -25.5549 | -18.5910 | -18.1985 | | 0.5132 | 1.35 | 270 | 0.4209 | 0.3185 | -0.5090 | 0.8581 | 0.8275 | -6.1652 | -25.5449 | -18.5784 | -18.1877 | | 0.5752 | 1.4 | 280 | 0.4164 | 0.3240 | -0.5260 | 0.8594 | 0.8500 | -6.1936 | -25.5357 | -18.5661 | -18.1772 | | 0.5374 | 1.45 | 290 | 0.4123 | 0.3290 | -0.5419 | 0.8587 | 0.8709 | -6.2202 | -25.5274 | -18.5551 | -18.1678 | | 0.49 | 1.5 | 300 | 0.4082 | 0.3343 | -0.5579 | 0.8594 | 0.8922 | -6.2468 | -25.5185 | -18.5447 | -18.1590 | | 0.5269 | 1.55 | 310 | 0.4040 | 0.3398 | -0.5748 | 0.8594 | 0.9146 | -6.2749 | -25.5094 | -18.5337 | -18.1497 | | 0.4636 | 1.6 | 320 | 0.4001 | 0.3447 | -0.5910 | 0.8600 | 0.9357 | -6.3020 | -25.5012 | -18.5220 | -18.1396 | | 0.4493 | 1.65 | 330 | 0.3963 | 0.3492 | -0.6073 | 0.8594 | 0.9565 | -6.3291 | -25.4937 | -18.5108 | -18.1300 | | 0.5583 | 1.7 | 340 | 0.3928 | 0.3535 | -0.6228 | 0.8594 | 0.9763 | -6.3550 | -25.4865 | -18.5004 | -18.1211 | | 0.5091 | 1.75 | 350 | 0.3895 | 0.3577 | -0.6377 | 0.8594 | 0.9953 | -6.3798 | -25.4796 | -18.4904 | -18.1124 | | 0.484 | 1.8 | 360 | 0.3864 | 0.3613 | -0.6521 | 0.8600 | 1.0134 | -6.4038 | -25.4735 | -18.4815 | -18.1048 | | 0.434 | 1.85 | 370 | 0.3834 | 0.3650 | -0.6665 | 0.8600 | 1.0315 | -6.4278 | -25.4674 | -18.4729 | -18.0974 | | 0.5252 | 1.9 | 380 | 0.3805 | 0.3687 | -0.6809 | 0.8600 | 1.0496 | -6.4518 | -25.4612 | -18.4636 | -18.0894 | | 0.5021 | 1.95 | 390 | 0.3778 | 0.3722 | -0.6940 | 0.8606 | 1.0662 | -6.4736 | -25.4554 | -18.4550 | -18.0821 | | 0.5079 | 2.0 | 400 | 0.3752 | 0.3754 | -0.7071 | 0.8606 | 1.0825 | -6.4954 | -25.4500 | -18.4466 | -18.0749 | | 0.4553 | 2.05 | 410 | 0.3725 | 0.3788 | -0.7208 | 0.8606 | 1.0996 | -6.5184 | -25.4445 | -18.4376 | -18.0672 | | 0.4719 | 2.1 | 420 | 0.3700 | 0.3814 | -0.7348 | 0.8606 | 1.1162 | -6.5417 | -25.4401 | -18.4293 | -18.0602 | | 0.4917 | 2.15 | 430 | 0.3676 | 0.3839 | -0.7481 | 0.8612 | 1.1321 | -6.5638 | -25.4358 | -18.4212 | -18.0532 | | 0.4459 | 2.2 | 440 | 0.3653 | 0.3862 | -0.7614 | 0.8612 | 1.1477 | -6.5860 | -25.4320 | -18.4130 | -18.0462 | | 0.4596 | 2.25 | 450 | 0.3631 | 0.3888 | -0.7744 | 0.8612 | 1.1631 | -6.6075 | -25.4278 | -18.4050 | -18.0393 | | 0.4018 | 2.3 | 460 | 0.3610 | 0.3913 | -0.7862 | 0.8619 | 1.1775 | -6.6274 | -25.4236 | -18.3975 | -18.0328 | | 0.4105 | 2.35 | 470 | 0.3589 | 0.3936 | -0.7986 | 0.8619 | 1.1921 | -6.6479 | -25.4198 | -18.3902 | -18.0267 | | 0.4227 | 2.4 | 480 | 0.3571 | 0.3956 | -0.8097 | 0.8619 | 1.2053 | -6.6664 | -25.4164 | -18.3839 | -18.0214 | | 0.4584 | 2.45 | 490 | 0.3553 | 0.3975 | -0.8205 | 0.8625 | 1.2180 | -6.6844 | -25.4132 | -18.3780 | -18.0165 | | 0.4309 | 2.5 | 500 | 0.3537 | 0.3995 | -0.8299 | 0.8619 | 1.2295 | -6.7002 | -25.4098 | -18.3728 | -18.0121 | | 0.4185 | 2.55 | 510 | 0.3522 | 0.4015 | -0.8390 | 0.8625 | 1.2405 | -6.7153 | -25.4066 | -18.3675 | -18.0077 | | 0.4103 | 2.6 | 520 | 0.3508 | 0.4033 | -0.8480 | 0.8625 | 1.2512 | -6.7303 | -25.4036 | -18.3622 | -18.0031 | | 0.4511 | 2.65 | 530 | 0.3493 | 0.4047 | -0.8570 | 0.8625 | 1.2618 | -6.7454 | -25.4012 | -18.3565 | -17.9982 | | 0.4111 | 2.7 | 540 | 0.3479 | 0.4061 | -0.8666 | 0.8625 | 1.2728 | -6.7613 | -25.3988 | -18.3507 | -17.9932 | | 0.4192 | 2.75 | 550 | 0.3465 | 0.4074 | -0.8763 | 0.8619 | 1.2837 | -6.7774 | -25.3967 | -18.3451 | -17.9885 | | 0.4278 | 2.8 | 560 | 0.3452 | 0.4087 | -0.8848 | 0.8619 | 1.2935 | -6.7916 | -25.3945 | -18.3397 | -17.9838 | | 0.4001 | 2.85 | 570 | 0.3439 | 0.4102 | -0.8927 | 0.8619 | 1.3028 | -6.8048 | -25.3921 | -18.3345 | -17.9793 | | 0.4006 | 2.9 | 580 | 0.3428 | 0.4112 | -0.9007 | 0.8619 | 1.3119 | -6.8181 | -25.3903 | -18.3294 | -17.9749 | | 0.3664 | 2.95 | 590 | 0.3417 | 0.4124 | -0.9084 | 0.8619 | 1.3208 | -6.8309 | -25.3884 | -18.3246 | -17.9707 | | 0.4518 | 3.0 | 600 | 0.3406 | 0.4133 | -0.9159 | 0.8619 | 1.3292 | -6.8435 | -25.3869 | -18.3200 | -17.9668 | | 0.3931 | 3.05 | 610 | 0.3396 | 0.4140 | -0.9233 | 0.8619 | 1.3374 | -6.8558 | -25.3856 | -18.3157 | -17.9631 | | 0.3842 | 3.1 | 620 | 0.3386 | 0.4148 | -0.9300 | 0.8619 | 1.3448 | -6.8670 | -25.3844 | -18.3116 | -17.9596 | | 0.3876 | 3.15 | 630 | 0.3378 | 0.4155 | -0.9363 | 0.8612 | 1.3519 | -6.8775 | -25.3832 | -18.3081 | -17.9566 | | 0.4318 | 3.2 | 640 | 0.3369 | 0.4163 | -0.9423 | 0.8612 | 1.3586 | -6.8875 | -25.3819 | -18.3046 | -17.9536 | | 0.4309 | 3.25 | 650 | 0.3362 | 0.4169 | -0.9481 | 0.8612 | 1.3650 | -6.8971 | -25.3808 | -18.3015 | -17.9509 | | 0.3602 | 3.3 | 660 | 0.3354 | 0.4176 | -0.9537 | 0.8619 | 1.3712 | -6.9064 | -25.3798 | -18.2985 | -17.9484 | | 0.4113 | 3.35 | 670 | 0.3347 | 0.4182 | -0.9590 | 0.8619 | 1.3771 | -6.9152 | -25.3788 | -18.2955 | -17.9459 | | 0.3874 | 3.4 | 680 | 0.3340 | 0.4187 | -0.9641 | 0.8612 | 1.3828 | -6.9237 | -25.3778 | -18.2924 | -17.9431 | | 0.4358 | 3.45 | 690 | 0.3334 | 0.4192 | -0.9686 | 0.8619 | 1.3878 | -6.9312 | -25.3770 | -18.2897 | -17.9408 | | 0.4318 | 3.5 | 700 | 0.3329 | 0.4197 | -0.9725 | 0.8625 | 1.3923 | -6.9379 | -25.3762 | -18.2873 | -17.9388 | | 0.3959 | 3.55 | 710 | 0.3324 | 0.4203 | -0.9764 | 0.8625 | 1.3967 | -6.9442 | -25.3752 | -18.2849 | -17.9367 | | 0.4003 | 3.6 | 720 | 0.3319 | 0.4208 | -0.9802 | 0.8625 | 1.4011 | -6.9507 | -25.3744 | -18.2827 | -17.9348 | | 0.4106 | 3.65 | 730 | 0.3314 | 0.4212 | -0.9837 | 0.8625 | 1.4050 | -6.9565 | -25.3737 | -18.2807 | -17.9331 | | 0.3852 | 3.7 | 740 | 0.3310 | 0.4216 | -0.9868 | 0.8625 | 1.4084 | -6.9617 | -25.3731 | -18.2790 | -17.9317 | | 0.4174 | 3.75 | 750 | 0.3306 | 0.4218 | -0.9898 | 0.8625 | 1.4116 | -6.9665 | -25.3727 | -18.2774 | -17.9303 | | 0.4188 | 3.8 | 760 | 0.3303 | 0.4221 | -0.9922 | 0.8631 | 1.4144 | -6.9707 | -25.3722 | -18.2760 | -17.9291 | | 0.39 | 3.85 | 770 | 0.3300 | 0.4224 | -0.9946 | 0.8631 | 1.4170 | -6.9746 | -25.3717 | -18.2745 | -17.9278 | | 0.3884 | 3.9 | 780 | 0.3297 | 0.4228 | -0.9969 | 0.8631 | 1.4197 | -6.9785 | -25.3711 | -18.2732 | -17.9267 | | 0.4019 | 3.95 | 790 | 0.3294 | 0.4230 | -0.9991 | 0.8631 | 1.4221 | -6.9821 | -25.3707 | -18.2720 | -17.9257 | | 0.3742 | 4.0 | 800 | 0.3292 | 0.4232 | -1.0009 | 0.8631 | 1.4241 | -6.9852 | -25.3704 | -18.2709 | -17.9248 | | 0.4229 | 4.05 | 810 | 0.3289 | 0.4234 | -1.0026 | 0.8631 | 1.4259 | -6.9879 | -25.3701 | -18.2701 | -17.9240 | | 0.4327 | 4.1 | 820 | 0.3288 | 0.4235 | -1.0040 | 0.8631 | 1.4275 | -6.9902 | -25.3699 | -18.2693 | -17.9234 | | 0.4086 | 4.15 | 830 | 0.3286 | 0.4237 | -1.0052 | 0.8631 | 1.4289 | -6.9923 | -25.3696 | -18.2687 | -17.9228 | | 0.3724 | 4.2 | 840 | 0.3285 | 0.4238 | -1.0063 | 0.8631 | 1.4301 | -6.9941 | -25.3694 | -18.2680 | -17.9223 | | 0.4155 | 4.25 | 850 | 0.3283 | 0.4239 | -1.0072 | 0.8631 | 1.4311 | -6.9957 | -25.3692 | -18.2675 | -17.9219 | | 0.378 | 4.3 | 860 | 0.3282 | 0.4240 | -1.0081 | 0.8631 | 1.4321 | -6.9972 | -25.3691 | -18.2670 | -17.9214 | | 0.3837 | 4.35 | 870 | 0.3281 | 0.4240 | -1.0089 | 0.8631 | 1.4329 | -6.9984 | -25.3690 | -18.2667 | -17.9211 | | 0.3666 | 4.4 | 880 | 0.3281 | 0.4241 | -1.0094 | 0.8631 | 1.4335 | -6.9992 | -25.3689 | -18.2664 | -17.9209 | | 0.3775 | 4.45 | 890 | 0.3280 | 0.4242 | -1.0098 | 0.8625 | 1.4340 | -6.9999 | -25.3688 | -18.2662 | -17.9207 | | 0.401 | 4.5 | 900 | 0.3280 | 0.4242 | -1.0101 | 0.8631 | 1.4343 | -7.0004 | -25.3687 | -18.2660 | -17.9206 | | 0.3887 | 4.55 | 910 | 0.3279 | 0.4243 | -1.0104 | 0.8631 | 1.4346 | -7.0009 | -25.3686 | -18.2659 | -17.9205 | | 0.4123 | 4.6 | 920 | 0.3279 | 0.4243 | -1.0106 | 0.8625 | 1.4349 | -7.0013 | -25.3686 | -18.2657 | -17.9204 | | 0.415 | 4.65 | 930 | 0.3279 | 0.4243 | -1.0108 | 0.8625 | 1.4351 | -7.0016 | -25.3686 | -18.2657 | -17.9203 | | 0.4636 | 4.7 | 940 | 0.3279 | 0.4243 | -1.0109 | 0.8625 | 1.4352 | -7.0017 | -25.3685 | -18.2656 | -17.9202 | | 0.3967 | 4.75 | 950 | 0.3279 | 0.4243 | -1.0109 | 0.8625 | 1.4353 | -7.0019 | -25.3685 | -18.2656 | -17.9202 | | 0.3853 | 4.8 | 960 | 0.3279 | 0.4243 | -1.0110 | 0.8625 | 1.4353 | -7.0019 | -25.3685 | -18.2655 | -17.9202 | | 0.3831 | 4.85 | 970 | 0.3279 | 0.4243 | -1.0110 | 0.8625 | 1.4353 | -7.0019 | -25.3685 | -18.2655 | -17.9202 | | 0.3945 | 4.9 | 980 | 0.3279 | 0.4243 | -1.0110 | 0.8625 | 1.4353 | -7.0019 | -25.3685 | -18.2655 | -17.9202 | | 0.3882 | 4.95 | 990 | 0.3279 | 0.4243 | -1.0110 | 0.8625 | 1.4353 | -7.0019 | -25.3685 | -18.2655 | -17.9202 | | 0.4374 | 5.0 | 1000 | 0.3279 | 0.4243 | -1.0110 | 0.8625 | 1.4353 | -7.0019 | -25.3685 | -18.2655 | -17.9202 | ### Framework versions - Transformers 4.45.2 - Pytorch 2.2.1 - Datasets 3.0.1 - Tokenizers 0.20.1