Edit model card

zephyr-7b-gpo-log-i0

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6894
  • Rewards/chosen: -0.1985
  • Rewards/rejected: -0.2851
  • Rewards/accuracies: 0.6680
  • Rewards/margins: 0.0866
  • Logps/rejected: -496.7296
  • Logps/chosen: -430.4875
  • Logits/rejected: -1.9716
  • Logits/chosen: -2.1747

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 0.001

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6931 0.01 100 -2.3483 -2.1599 -231.7733 -211.4642 0.6931 0.4845 0.0002 0.0001 0.0001
0.6931 0.01 200 -2.3487 -2.1603 -231.4387 -211.6443 0.6931 0.5800 0.0006 0.0006 -0.0000
0.6929 0.02 300 -2.3526 -2.1641 -228.5697 -209.9551 0.6930 0.5810 0.0034 0.0018 0.0017
0.6929 0.03 400 -2.3535 -2.1649 -226.9053 -210.0602 0.6929 0.5950 0.0051 0.0035 0.0016
0.6927 0.03 500 -2.3455 -2.1572 -243.4080 -230.7395 0.6927 0.6065 -0.0114 0.0077 -0.0191
0.6924 0.04 600 -2.3406 -2.1523 -240.4310 -234.4804 0.6924 0.6110 -0.0084 0.0144 -0.0229
0.6928 0.05 700 -2.3223 -2.1352 -255.1952 -253.8729 0.6922 0.6090 -0.0232 0.0191 -0.0423
0.6918 0.05 800 -2.3240 -2.1358 -289.2699 -303.1003 0.6920 0.5970 -0.0573 0.0342 -0.0915
0.6928 0.06 900 -2.2816 -2.0968 -253.8069 -254.9312 0.6919 0.6050 -0.0218 0.0215 -0.0433
0.6908 0.07 1000 -1.7918 -1.6316 -308.5073 -327.3462 0.6915 0.6020 -0.0765 0.0392 -0.1157
0.6925 0.07 1100 -1.6588 -1.5052 -304.1779 -315.8557 0.6915 0.6170 -0.0722 0.0321 -0.1042
0.6906 0.08 1200 -1.2484 -1.1079 -475.6542 -521.3765 0.6911 0.6155 -0.2436 0.0661 -0.3098
0.6901 0.09 1300 -1.7384 -1.5767 -401.8933 -423.5184 0.6909 0.6170 -0.1699 0.0420 -0.2119
0.6887 0.09 1400 -1.4353 -1.2834 -424.3940 -463.2532 0.6906 0.6420 -0.1924 0.0593 -0.2516
0.69 0.1 1500 -1.4837 -1.3358 -428.6115 -450.5626 0.6911 0.6230 -0.1966 0.0423 -0.2390
0.6896 0.1 1600 -0.9061 -0.7923 -381.1222 -427.0179 0.6905 0.6350 -0.1491 0.0663 -0.2154
0.6896 0.11 1700 -1.4801 -1.3317 -311.4821 -351.0151 0.6904 0.6255 -0.0795 0.0599 -0.1394
0.6907 0.12 1800 -1.5447 -1.3930 -338.7189 -368.5627 0.6906 0.6295 -0.1067 0.0502 -0.1570
0.6903 0.12 1900 -1.5034 -1.3577 -351.9511 -381.6029 0.6906 0.6510 -0.1199 0.0500 -0.1700
0.6907 0.13 2000 -1.3523 -1.2127 -394.4135 -441.4114 0.6902 0.6430 -0.1624 0.0674 -0.2298
0.6901 0.14 2100 -1.4995 -1.3486 -364.0427 -413.4233 0.6902 0.6455 -0.1320 0.0698 -0.2018
0.6914 0.14 2200 -1.6063 -1.4533 -326.1748 -362.9125 0.6903 0.6345 -0.0942 0.0571 -0.1513
0.6916 0.15 2300 -1.1935 -1.0524 -545.2796 -613.0294 0.6904 0.6450 -0.3133 0.0881 -0.4014
0.6897 0.16 2400 -1.3794 -1.2359 -354.2448 -401.9844 0.6901 0.6390 -0.1222 0.0681 -0.1904
0.6921 0.16 2500 -1.2731 -1.1393 -346.2455 -378.1649 0.6903 0.6485 -0.1142 0.0523 -0.1666
0.6893 0.17 2600 -0.7405 -0.6334 -444.1414 -493.0764 0.6899 0.6545 -0.2121 0.0693 -0.2815
0.6898 0.18 2700 -1.1283 -0.9989 -451.3296 -502.5096 0.6902 0.6480 -0.2193 0.0716 -0.2909
0.6905 0.18 2800 -1.2888 -1.1517 -361.0450 -403.0554 0.6902 0.6460 -0.1290 0.0624 -0.1914
0.6888 0.19 2900 -0.9720 -0.8516 -368.5533 -425.2483 0.6901 0.6520 -0.1365 0.0771 -0.2136
0.6906 0.2 3000 -0.8705 -0.7524 -415.2959 -477.5717 0.6900 0.6450 -0.1833 0.0827 -0.2660
0.6921 0.2 3100 -0.4969 -0.4021 -462.8434 -519.7990 0.6900 0.6550 -0.2308 0.0773 -0.3082
0.6867 0.21 3200 -1.0904 -0.9625 -319.7759 -371.4648 0.6899 0.6620 -0.0878 0.0721 -0.1599
0.691 0.22 3300 -0.7452 -0.6384 -352.8773 -411.2454 0.6899 0.6470 -0.1209 0.0788 -0.1996
0.6903 0.22 3400 -0.9077 -0.7891 -387.8607 -448.2271 0.6899 0.6530 -0.1559 0.0808 -0.2366
0.6899 0.23 3500 -0.7944 -0.6855 -357.3799 -413.9256 0.6898 0.6515 -0.1254 0.0769 -0.2023
0.6911 0.24 3600 -0.6122 -0.5134 -367.0983 -434.4857 0.6899 0.6615 -0.1351 0.0878 -0.2229
0.6938 0.24 3700 -0.7990 -0.6917 -355.7067 -411.4952 0.6899 0.6545 -0.1237 0.0762 -0.1999
0.6892 0.25 3800 -0.9948 -0.8768 -332.9629 -383.2249 0.6899 0.6570 -0.1010 0.0707 -0.1716
0.688 0.26 3900 -0.8941 -0.7770 -480.2304 -543.8677 0.6898 0.6495 -0.2482 0.0840 -0.3323
0.6879 0.26 4000 -1.0341 -0.9098 -458.0283 -518.2861 0.6897 0.6540 -0.2260 0.0807 -0.3067
0.6933 0.27 4100 -1.1499 -1.0199 -460.5152 -502.6912 0.6899 0.6520 -0.2285 0.0626 -0.2911
0.6908 0.27 4200 -0.8099 -0.7000 -446.6075 -499.0349 0.6899 0.6490 -0.2146 0.0728 -0.2874
0.6902 0.28 4300 -0.8444 -0.7309 -529.0878 -583.2983 0.6898 0.6585 -0.2971 0.0746 -0.3717
0.6895 0.29 4400 -0.7360 -0.6257 -519.1375 -587.6252 0.6899 0.6505 -0.2871 0.0889 -0.3760
0.6864 0.29 4500 -1.1400 -1.0095 -381.3715 -431.1979 0.6898 0.6590 -0.1494 0.0702 -0.2196
0.6903 0.3 4600 -1.1055 -0.9769 -445.2477 -493.5267 0.6898 0.6500 -0.2132 0.0687 -0.2819
0.6849 0.31 4700 -0.8268 -0.7087 -512.8202 -581.3583 0.6898 0.6545 -0.2808 0.0889 -0.3697
0.6902 0.31 4800 -0.7168 -0.6072 -480.0729 -542.0420 0.6898 0.6600 -0.2481 0.0824 -0.3304
0.6888 0.32 4900 -0.9126 -0.7924 -470.6658 -534.4230 0.6897 0.6545 -0.2387 0.0842 -0.3228
0.6915 0.33 5000 -0.9266 -0.8070 -433.9764 -492.5659 0.6899 0.6505 -0.2020 0.0790 -0.2810
0.6903 0.33 5100 -0.8068 -0.6929 -539.7931 -603.4097 0.6899 0.6540 -0.3078 0.0840 -0.3918
0.6889 0.34 5200 -0.3375 -0.2562 -600.0009 -653.9318 0.6899 0.6550 -0.3680 0.0743 -0.4423
0.6925 0.35 5300 -0.8018 -0.6898 -452.7947 -513.4368 0.6897 0.6605 -0.2208 0.0810 -0.3018
0.6883 0.35 5400 -0.9780 -0.8576 -446.1244 -496.6496 0.6898 0.6590 -0.2141 0.0709 -0.2850
0.6885 0.36 5500 -0.9162 -0.8003 -444.0844 -493.7664 0.6900 0.6550 -0.2121 0.0701 -0.2822
0.6896 0.37 5600 -0.7257 -0.6178 -536.2577 -598.6424 0.6898 0.6570 -0.3043 0.0828 -0.3870
0.6898 0.37 5700 -0.7838 -0.6697 -537.5150 -606.8261 0.6897 0.6550 -0.3055 0.0897 -0.3952
0.6894 0.38 5800 -0.3527 -0.2645 -553.5991 -630.1355 0.6898 0.6525 -0.3216 0.0969 -0.4185
0.6899 0.39 5900 -0.5848 -0.4854 -493.8012 -557.8223 0.6897 0.6535 -0.2618 0.0844 -0.3462
0.6896 0.39 6000 -0.9796 -0.8594 -429.3099 -477.4435 0.6897 0.6630 -0.1973 0.0685 -0.2658
0.6885 0.4 6100 -0.9699 -0.8472 -491.1470 -547.7917 0.6896 0.6580 -0.2591 0.0770 -0.3362
0.6905 0.41 6200 -1.2529 -1.1147 -485.2599 -540.0978 0.6896 0.6610 -0.2533 0.0752 -0.3285
0.6885 0.41 6300 -1.4056 -1.2600 -469.2736 -521.1331 0.6897 0.6635 -0.2373 0.0723 -0.3095
0.689 0.42 6400 -1.2237 -1.0844 -566.3376 -629.9547 0.6897 0.6530 -0.3343 0.0840 -0.4183
0.6913 0.43 6500 -1.3350 -1.1925 -548.8687 -604.5547 0.6897 0.6630 -0.3169 0.0761 -0.3929
0.6896 0.43 6600 -1.3852 -1.2377 -535.6201 -595.6210 0.6898 0.6570 -0.3036 0.0804 -0.3840
0.6913 0.44 6700 -1.0456 -0.9158 -599.0520 -667.5596 0.6898 0.6660 -0.3670 0.0889 -0.4559
0.6913 0.44 6800 -1.1365 -1.0039 -551.1534 -605.5635 0.6897 0.6605 -0.3191 0.0748 -0.3940
0.6876 0.45 6900 -1.0859 -0.9569 -539.8351 -594.4590 0.6897 0.6640 -0.3078 0.0750 -0.3828
0.6878 0.46 7000 -1.0649 -0.9341 -569.5941 -639.7524 0.6896 0.6655 -0.3376 0.0906 -0.4281
0.6889 0.46 7100 -1.0207 -0.8952 -503.5752 -564.2007 0.6896 0.6600 -0.2716 0.0810 -0.3526
0.6887 0.47 7200 -1.1860 -1.0530 -497.2397 -547.0663 0.6897 0.6625 -0.2652 0.0702 -0.3355
0.6905 0.48 7300 -1.0693 -0.9412 -495.2763 -555.8788 0.6896 0.6640 -0.2633 0.0810 -0.3443
0.6933 0.48 7400 -0.8035 -0.6899 -545.2488 -615.5215 0.6896 0.6585 -0.3132 0.0907 -0.4039
0.6885 0.49 7500 -0.9690 -0.8482 -479.9738 -539.5114 0.6896 0.6615 -0.2480 0.0799 -0.3279
0.6873 0.5 7600 -1.0029 -0.8792 -477.3996 -540.6624 0.6897 0.6640 -0.2454 0.0837 -0.3291
0.6896 0.5 7700 -1.0249 -0.9031 -435.6980 -492.8593 0.6897 0.6540 -0.2037 0.0776 -0.2812
0.6893 0.51 7800 -0.7930 -0.6833 -514.2276 -576.9700 0.6897 0.6675 -0.2822 0.0831 -0.3654
0.6896 0.52 7900 -0.9352 -0.8165 -467.9955 -532.0470 0.6895 0.6685 -0.2360 0.0844 -0.3204
0.6909 0.52 8000 -1.0597 -0.9345 -404.4120 -465.3136 0.6895 0.6695 -0.1724 0.0813 -0.2537
0.6908 0.53 8100 -0.7901 -0.6795 -433.8758 -497.8059 0.6895 0.6695 -0.2019 0.0843 -0.2862
0.6904 0.54 8200 -1.0770 -0.9495 -421.2912 -480.1203 0.6895 0.6695 -0.1893 0.0792 -0.2685
0.6895 0.54 8300 -0.9703 -0.8504 -410.2920 -467.8550 0.6896 0.6655 -0.1783 0.0780 -0.2562
0.6904 0.55 8400 -0.9014 -0.7865 -381.2902 -431.9937 0.6897 0.6680 -0.1493 0.0711 -0.2204
0.6878 0.56 8500 -0.5620 -0.4634 -475.0464 -545.1041 0.6895 0.6710 -0.2430 0.0905 -0.3335
0.6881 0.56 8600 -0.6348 -0.5309 -483.7722 -558.7502 0.6895 0.6720 -0.2518 0.0954 -0.3471
0.6888 0.57 8700 -0.7448 -0.6360 -477.9683 -541.1770 0.6895 0.6765 -0.2460 0.0836 -0.3296
0.6912 0.58 8800 -0.8669 -0.7510 -423.5055 -484.2249 0.6895 0.6725 -0.1915 0.0811 -0.2726
0.6905 0.58 8900 -0.6408 -0.5339 -507.0282 -584.7628 0.6895 0.6705 -0.2750 0.0981 -0.3732
0.6889 0.59 9000 -0.9425 -0.8210 -420.3015 -482.8202 0.6895 0.6720 -0.1883 0.0829 -0.2712
0.6906 0.6 9100 -1.0030 -0.8787 -430.4298 -486.5714 0.6895 0.6730 -0.1984 0.0765 -0.2750
0.6893 0.6 9200 -1.0853 -0.9546 -403.3822 -463.1019 0.6895 0.6730 -0.1714 0.0801 -0.2515
0.6902 0.61 9300 -0.8924 -0.7709 -436.1401 -502.2715 0.6895 0.6790 -0.2041 0.0865 -0.2907
0.6885 0.62 9400 -0.7878 -0.6740 -455.3779 -517.3574 0.6895 0.6705 -0.2234 0.0824 -0.3057
0.6864 0.62 9500 -0.7308 -0.6198 -440.1345 -503.9654 0.6895 0.6670 -0.2081 0.0842 -0.2924
0.6896 0.63 9600 -0.7276 -0.6168 -417.3858 -483.2873 0.6895 0.6665 -0.1854 0.0863 -0.2717
0.6884 0.63 9700 -0.6134 -0.5091 -434.1582 -500.0406 0.6895 0.6665 -0.2022 0.0863 -0.2884
0.6913 0.64 9800 -0.6631 -0.5573 -428.2690 -488.5942 0.6895 0.6675 -0.1963 0.0807 -0.2770
0.6887 0.65 9900 -0.5865 -0.4827 -436.9484 -508.5711 0.6895 0.6700 -0.2049 0.0920 -0.2970
0.6886 0.65 10000 -0.4209 -0.3262 -493.7508 -570.7292 0.6895 0.6685 -0.2617 0.0974 -0.3591
0.6858 0.66 10100 -0.5254 -0.4271 -472.5491 -537.1530 0.6895 0.6690 -0.2405 0.0850 -0.3255
0.6902 0.67 10200 -0.6713 -0.5611 -470.4224 -546.7172 0.6895 0.6640 -0.2384 0.0967 -0.3351
0.6877 0.67 10300 -0.8372 -0.7204 -434.0807 -497.1747 0.6894 0.6635 -0.2021 0.0835 -0.2856
0.6889 0.68 10400 -0.8565 -0.7377 -437.3454 -502.5919 0.6895 0.6610 -0.2053 0.0856 -0.2910
0.6885 0.69 10500 -0.8123 -0.6946 -479.5160 -547.5475 0.6895 0.6650 -0.2475 0.0884 -0.3359
0.6884 0.69 10600 -0.8156 -0.6967 -486.4954 -556.6547 0.6895 0.6640 -0.2545 0.0906 -0.3450
0.6909 0.7 10700 -0.8308 -0.7125 -463.0313 -527.8248 0.6895 0.6630 -0.2310 0.0852 -0.3162
0.6877 0.71 10800 -0.9392 -0.8151 -437.7896 -501.5677 0.6895 0.6645 -0.2058 0.0842 -0.2900
0.6921 0.71 10900 -1.0966 -0.9660 -388.6564 -443.5241 0.6895 0.6570 -0.1567 0.0753 -0.2319
0.6906 0.72 11000 -1.0157 -0.8885 -420.5682 -481.6877 0.6895 0.6630 -0.1886 0.0815 -0.2701
0.6898 0.73 11100 -1.0298 -0.9005 -430.2829 -496.5845 0.6895 0.6655 -0.1983 0.0867 -0.2850
0.6924 0.73 11200 -1.2117 -1.0739 -382.5533 -440.3696 0.6895 0.6630 -0.1505 0.0782 -0.2288
0.6875 0.74 11300 -1.1264 -0.9923 -401.6754 -464.6775 0.6895 0.6610 -0.1697 0.0834 -0.2531
0.6895 0.75 11400 -1.0544 -0.9230 -428.9089 -497.7010 0.6895 0.6640 -0.1969 0.0892 -0.2861
0.6901 0.75 11500 -0.9567 -0.8319 -427.2798 -491.3987 0.6895 0.6640 -0.1953 0.0845 -0.2798
0.6887 0.76 11600 -0.8249 -0.7049 -462.7817 -536.8845 0.6895 0.6685 -0.2308 0.0945 -0.3253
0.6883 0.77 11700 -0.7566 -0.6421 -466.8342 -534.1976 0.6895 0.6660 -0.2348 0.0878 -0.3226
0.6904 0.77 11800 -0.6848 -0.5755 -467.6891 -530.0805 0.6895 0.6680 -0.2357 0.0828 -0.3185
0.6868 0.78 11900 -0.7183 -0.6058 -469.5602 -536.6125 0.6895 0.6660 -0.2376 0.0874 -0.3250
0.6884 0.79 12000 -0.8024 -0.6854 -454.8383 -520.2646 0.6894 0.6670 -0.2228 0.0858 -0.3087
0.6878 0.79 12100 -0.8306 -0.7124 -446.4841 -511.0565 0.6894 0.6655 -0.2145 0.0850 -0.2994
0.6903 0.8 12200 -0.9147 -0.7927 -427.7274 -490.2318 0.6894 0.6675 -0.1957 0.0829 -0.2786
0.6914 0.8 12300 -0.8736 -0.7533 -441.0270 -506.0264 0.6894 0.6655 -0.2090 0.0854 -0.2944
0.6923 0.81 12400 -0.9178 -0.7957 -427.3274 -489.2210 0.6894 0.6700 -0.1953 0.0823 -0.2776
0.6892 0.82 12500 -0.9397 -0.8164 -422.7649 -484.4884 0.6894 0.6665 -0.1908 0.0821 -0.2729
0.6898 0.82 12600 -0.9114 -0.7894 -430.8014 -493.8383 0.6894 0.6665 -0.1988 0.0834 -0.2822
0.6903 0.83 12700 -0.9304 -0.8066 -433.3563 -498.3890 0.6894 0.6665 -0.2014 0.0854 -0.2868
0.6906 0.84 12800 -0.9460 -0.8214 -435.6780 -501.1642 0.6894 0.6665 -0.2037 0.0859 -0.2896
0.6903 0.84 12900 -0.9910 -0.8630 -435.8429 -502.7374 0.6894 0.6675 -0.2038 0.0873 -0.2911
0.6887 0.85 13000 -0.9750 -0.8480 -436.7468 -504.1572 0.6894 0.6690 -0.2047 0.0878 -0.2925
0.6917 0.86 13100 -1.0175 -0.8889 -425.4182 -489.5446 0.6894 0.6670 -0.1934 0.0845 -0.2779
0.6877 0.86 13200 -1.0098 -0.8810 -429.9781 -496.1291 0.6894 0.6695 -0.1980 0.0865 -0.2845
0.6887 0.87 13300 -1.0319 -0.9020 -425.5203 -491.0486 0.6894 0.6665 -0.1935 0.0859 -0.2794
0.6916 0.88 13400 -1.0431 -0.9129 -420.7966 -485.2116 0.6894 0.6710 -0.1888 0.0848 -0.2736
0.6905 0.88 13500 -1.0459 -0.9157 -419.9940 -484.3698 0.6894 0.6680 -0.1880 0.0848 -0.2728
0.691 0.89 13600 -1.0011 -0.8732 -428.5783 -494.4618 0.6894 0.6690 -0.1966 0.0863 -0.2828
0.6911 0.9 13700 -1.0116 -0.8833 -426.2202 -491.4141 0.6894 0.6700 -0.1942 0.0856 -0.2798
0.6892 0.9 13800 -0.9911 -0.8639 -431.2167 -497.5966 0.6894 0.6695 -0.1992 0.0868 -0.2860
0.6905 0.91 13900 -0.9932 -0.8657 -430.5668 -497.0990 0.6894 0.6705 -0.1986 0.0869 -0.2855
0.6884 0.92 14000 -0.9785 -0.8517 -433.5998 -500.4916 0.6894 0.6670 -0.2016 0.0873 -0.2889
0.6892 0.92 14100 -0.9989 -0.8711 -429.9120 -496.2607 0.6894 0.6695 -0.1979 0.0867 -0.2846
0.689 0.93 14200 -0.9909 -0.8633 -431.8853 -498.7849 0.6894 0.6695 -0.1999 0.0873 -0.2872
0.6911 0.94 14300 -0.9981 -0.8703 -430.6117 -496.9819 0.6894 0.6680 -0.1986 0.0868 -0.2854
0.6898 0.94 14400 -0.9977 -0.8700 -430.6717 -497.1328 0.6894 0.6675 -0.1987 0.0869 -0.2855
0.6909 0.95 14500 -0.9958 -0.8681 -431.3944 -498.0706 0.6894 0.6695 -0.1994 0.0871 -0.2865
0.6889 0.96 14600 -0.9952 -0.8676 -430.4063 -496.6932 0.6894 0.6690 -0.1984 0.0867 -0.2851
0.6902 0.96 14700 -0.9974 -0.8697 -430.0926 -496.2929 0.6894 0.6690 -0.1981 0.0866 -0.2847
0.6894 0.97 14800 -0.9956 -0.8682 -430.4017 -496.6894 0.6894 0.6675 -0.1984 0.0867 -0.2851

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
49
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train DUAL-GPO/zephyr-7b-gpo-log-i0