Edit model card

zephyr-7b-gpo-update3-i0

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0224
  • Rewards/chosen: -0.1801
  • Rewards/rejected: -0.2679
  • Rewards/accuracies: 0.6755
  • Rewards/margins: 0.0877
  • Logps/rejected: -479.4648
  • Logps/chosen: -412.1429
  • Logits/rejected: -0.8868
  • Logits/chosen: -1.0169

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0705 0.01 100 0.0537 0.0002 0.0001 0.5070 0.0001 -211.4654 -231.7675 -2.1589 -2.3472
0.0633 0.01 200 0.0534 0.0006 0.0000 0.5790 0.0006 -211.5886 -231.3686 -2.1606 -2.3491
0.0555 0.02 300 0.0528 0.0038 0.0019 0.5865 0.0020 -209.7327 -228.1604 -2.1644 -2.3530
0.0716 0.03 400 0.0517 0.0045 0.0003 0.5985 0.0042 -211.3239 -227.4712 -2.1658 -2.3544
0.0532 0.03 500 0.0506 -0.0335 -0.0410 0.6050 0.0076 -252.6588 -265.4973 -2.1479 -2.3356
0.0353 0.04 600 0.0482 -0.0051 -0.0176 0.6040 0.0126 -229.2327 -237.0613 -2.1837 -2.3733
0.0607 0.05 700 0.0442 -0.0174 -0.0394 0.6155 0.0220 -251.0582 -249.4283 -2.1498 -2.3382
0.0373 0.05 800 0.0450 -0.0986 -0.1354 0.5900 0.0368 -346.9684 -330.5711 -2.1610 -2.3525
0.0333 0.06 900 0.0453 -0.0231 -0.0419 0.6065 0.0187 -253.4719 -255.1281 -2.1130 -2.2974
0.0469 0.07 1000 0.0408 -0.0664 -0.0994 0.6020 0.0330 -311.0526 -298.4168 -2.0108 -2.1907
0.0387 0.07 1100 0.0416 -0.1900 -0.2240 0.6030 0.0340 -435.6592 -422.0504 -1.4115 -1.5584
0.0377 0.08 1200 0.0409 -0.1076 -0.1513 0.6110 0.0437 -362.9415 -339.6366 -1.3325 -1.4831
0.0414 0.09 1300 0.0353 -0.1923 -0.2414 0.6160 0.0491 -453.0328 -424.3461 -1.2024 -1.3430
0.0363 0.09 1400 0.0352 -0.1443 -0.1836 0.625 0.0393 -395.2076 -376.2808 -1.3508 -1.4962
0.0741 0.1 1500 0.0350 -0.1363 -0.1823 0.6235 0.0460 -393.9025 -368.3273 -1.0220 -1.1484
0.0348 0.1 1600 0.0334 -0.2731 -0.3511 0.6275 0.0780 -562.7497 -505.1403 -0.8525 -0.9803
0.0251 0.11 1700 0.0318 -0.2572 -0.3298 0.6410 0.0726 -541.3788 -489.1554 -1.1495 -1.2961
0.036 0.12 1800 0.0325 -0.1508 -0.1958 0.6205 0.0451 -407.4576 -382.7708 -1.5867 -1.7516
0.0142 0.12 1900 0.0312 -0.2575 -0.3145 0.6335 0.0570 -526.0776 -489.4697 -1.2253 -1.3692
0.0176 0.13 2000 0.0282 -0.1856 -0.2730 0.6460 0.0873 -484.5845 -417.6276 -1.5396 -1.7095
0.0176 0.14 2100 0.0275 -0.1327 -0.2078 0.6505 0.0751 -419.3942 -364.7262 -1.5587 -1.7265
0.0387 0.14 2200 0.0277 -0.1042 -0.1708 0.6385 0.0666 -382.4240 -336.1856 -1.6316 -1.8005
0.0284 0.15 2300 0.0275 -0.1814 -0.2465 0.6345 0.0651 -458.0886 -413.4149 -1.7580 -1.9373
0.0351 0.16 2400 0.0296 -0.1479 -0.2087 0.6405 0.0609 -420.3434 -379.8790 -1.6926 -1.8704
0.0143 0.16 2500 0.0285 -0.1597 -0.2193 0.6545 0.0597 -430.9314 -391.6554 -1.5350 -1.6983
0.0224 0.17 2600 0.0265 -0.2066 -0.2771 0.6545 0.0706 -488.7431 -438.5660 -1.3152 -1.4686
0.0331 0.18 2700 0.0268 -0.1739 -0.2488 0.6515 0.0748 -460.3621 -405.9103 -1.5228 -1.6880
0.0387 0.18 2800 0.0276 -0.0764 -0.1367 0.6510 0.0603 -348.3400 -308.4065 -1.4048 -1.5555
0.0343 0.19 2900 0.0264 -0.2299 -0.3102 0.6535 0.0803 -521.8264 -461.8814 -1.0216 -1.1548
0.0267 0.2 3000 0.0275 -0.2473 -0.3356 0.6520 0.0883 -547.2559 -479.3535 -1.0688 -1.2088
0.0355 0.2 3100 0.0280 -0.2277 -0.2978 0.6415 0.0700 -509.3696 -459.7389 -1.2857 -1.4360
0.0291 0.21 3200 0.0259 -0.1519 -0.2282 0.6635 0.0763 -439.8501 -383.9444 -1.3484 -1.5017
0.035 0.22 3300 0.0257 -0.1210 -0.2008 0.6555 0.0798 -412.4005 -353.0179 -1.5265 -1.6883
0.0319 0.22 3400 0.0263 -0.1372 -0.2147 0.6515 0.0775 -426.3360 -369.1944 -1.4126 -1.5692
0.0257 0.23 3500 0.0256 -0.1661 -0.2429 0.6550 0.0768 -454.5053 -398.1262 -1.4163 -1.5722
0.0275 0.24 3600 0.0262 -0.1719 -0.2629 0.6575 0.0910 -474.4635 -403.8749 -1.3717 -1.5261
0.0367 0.24 3700 0.0266 -0.1726 -0.2519 0.6575 0.0793 -463.4673 -404.5643 -1.4203 -1.5758
0.0357 0.25 3800 0.0260 -0.0704 -0.1387 0.6575 0.0682 -350.2820 -302.4371 -1.5307 -1.6889
0.0249 0.26 3900 0.0256 -0.2003 -0.2911 0.6635 0.0908 -502.6758 -432.3128 -1.0767 -1.2149
0.0496 0.26 4000 0.0246 -0.1700 -0.2550 0.6640 0.0850 -466.5954 -402.0156 -1.2870 -1.4356
0.0166 0.27 4100 0.0273 -0.1833 -0.2458 0.6600 0.0625 -457.4213 -415.3354 -1.2058 -1.3468
0.0257 0.27 4200 0.0275 -0.1551 -0.2293 0.6505 0.0742 -440.8662 -387.0569 -1.3883 -1.5435
0.0381 0.28 4300 0.0256 -0.1096 -0.1865 0.6630 0.0769 -398.1021 -341.5804 -1.5158 -1.6790
0.0142 0.29 4400 0.0256 -0.1428 -0.2296 0.6605 0.0868 -441.2437 -374.8350 -1.1203 -1.2625
0.0161 0.29 4500 0.0253 -0.1292 -0.2014 0.6585 0.0722 -412.9791 -361.1862 -1.2417 -1.3864
0.0252 0.3 4600 0.0260 -0.0895 -0.1540 0.6615 0.0645 -365.6145 -321.5078 -1.4506 -1.6068
0.0265 0.31 4700 0.0262 -0.2428 -0.3365 0.6565 0.0937 -548.1023 -474.7587 -0.9481 -1.0844
0.0428 0.31 4800 0.0251 -0.1762 -0.2585 0.6590 0.0822 -470.0755 -408.2503 -0.7928 -0.9170
0.0331 0.32 4900 0.0257 -0.1637 -0.2481 0.6585 0.0844 -459.6623 -395.7015 -0.8176 -0.9423
0.0206 0.33 5000 0.0263 -0.1448 -0.2194 0.6635 0.0746 -430.9643 -376.7931 -0.7098 -0.8233
0.0158 0.33 5100 0.0256 -0.2789 -0.3617 0.6555 0.0828 -573.3056 -510.8705 -0.7416 -0.8615
0.0145 0.34 5200 0.0260 -0.1978 -0.2690 0.6660 0.0711 -480.5622 -429.8432 -0.9478 -1.0757
0.0209 0.35 5300 0.0255 -0.1522 -0.2287 0.6585 0.0766 -440.3552 -384.1584 -1.2392 -1.3869
0.0292 0.35 5400 0.0258 -0.1740 -0.2459 0.6605 0.0719 -457.4742 -405.9723 -1.2221 -1.3683
0.0104 0.36 5500 0.0258 -0.1628 -0.2414 0.6585 0.0786 -453.0058 -394.8098 -1.1724 -1.3171
0.0201 0.37 5600 0.0267 -0.3001 -0.3834 0.6595 0.0833 -595.0033 -532.1312 -1.1342 -1.2817
0.0258 0.37 5700 0.0264 -0.3214 -0.4042 0.6495 0.0827 -615.7876 -553.4460 -0.9025 -1.0350
0.0254 0.38 5800 0.0248 -0.1813 -0.2698 0.6560 0.0885 -481.4164 -413.2734 -1.2336 -1.3844
0.0237 0.39 5900 0.0247 -0.1357 -0.2169 0.6605 0.0811 -428.4645 -367.7495 -1.2361 -1.3841
0.025 0.39 6000 0.0250 -0.0936 -0.1640 0.6605 0.0704 -375.6244 -325.6407 -1.3252 -1.4747
0.0267 0.4 6100 0.0245 -0.1079 -0.1847 0.6640 0.0768 -396.3334 -339.8831 -1.1771 -1.3187
0.0157 0.41 6200 0.0244 -0.1200 -0.1970 0.6600 0.0769 -408.5906 -352.0449 -1.2099 -1.3534
0.0339 0.41 6300 0.0250 -0.1141 -0.1911 0.6645 0.0770 -402.7368 -346.1321 -1.1887 -1.3301
0.0239 0.42 6400 0.0256 -0.1095 -0.1887 0.6545 0.0792 -400.2938 -341.5355 -1.1653 -1.3054
0.0609 0.43 6500 0.0258 -0.1790 -0.2637 0.6640 0.0847 -475.3234 -411.0543 -0.7519 -0.8671
0.0274 0.43 6600 0.0252 -0.1233 -0.2002 0.6685 0.0769 -411.8316 -355.3340 -1.1117 -1.2477
0.0308 0.44 6700 0.0260 -0.2033 -0.2927 0.6580 0.0894 -504.2830 -435.3035 -0.8339 -0.9571
0.0442 0.44 6800 0.0252 -0.1567 -0.2327 0.6715 0.0760 -444.3407 -388.7112 -0.9082 -1.0316
0.0454 0.45 6900 0.0244 -0.1860 -0.2627 0.6660 0.0767 -474.3181 -417.9738 -0.8091 -0.9271
0.0229 0.46 7000 0.0241 -0.1897 -0.2739 0.6705 0.0843 -485.5567 -421.6742 -0.7967 -0.9160
0.0213 0.46 7100 0.0239 -0.2099 -0.2963 0.6675 0.0864 -507.9073 -441.9356 -0.6326 -0.7425
0.0351 0.47 7200 0.0241 -0.1826 -0.2598 0.6685 0.0772 -471.4492 -414.6008 -0.7077 -0.8202
0.0198 0.48 7300 0.0237 -0.2418 -0.3216 0.6695 0.0799 -533.2533 -473.7774 -0.6382 -0.7481
0.0267 0.48 7400 0.0238 -0.2263 -0.3121 0.6635 0.0857 -523.6796 -458.3290 -0.8072 -0.9286
0.0183 0.49 7500 0.0240 -0.2262 -0.3151 0.6685 0.0889 -526.6686 -458.1802 -0.7953 -0.9168
0.0384 0.5 7600 0.0244 -0.2211 -0.3110 0.6620 0.0900 -522.6359 -453.0678 -0.8678 -0.9928
0.0107 0.5 7700 0.0243 -0.1361 -0.2179 0.6615 0.0818 -429.5078 -368.1310 -1.1731 -1.3135
0.026 0.51 7800 0.0248 -0.2264 -0.3139 0.6630 0.0875 -525.5045 -458.3771 -0.8686 -0.9939
0.0268 0.52 7900 0.0235 -0.2119 -0.3016 0.6720 0.0897 -513.2527 -443.9242 -1.0222 -1.1573
0.0368 0.52 8000 0.0234 -0.1716 -0.2553 0.6675 0.0837 -466.9293 -403.5861 -1.0878 -1.2254
0.0293 0.53 8100 0.0230 -0.2229 -0.3118 0.6695 0.0889 -523.4254 -454.8972 -0.8559 -0.9809
0.0127 0.54 8200 0.0234 -0.1810 -0.2616 0.6660 0.0807 -473.2369 -412.9599 -1.0361 -1.1700
0.0169 0.54 8300 0.0241 -0.1442 -0.2229 0.6690 0.0787 -434.5301 -376.2298 -1.1765 -1.3181
0.0177 0.55 8400 0.0249 -0.1232 -0.1920 0.6685 0.0687 -403.5682 -355.2328 -1.1804 -1.3186
0.0277 0.56 8500 0.0232 -0.2036 -0.2918 0.6715 0.0882 -503.4426 -435.6166 -0.9559 -1.0856
0.0187 0.56 8600 0.0230 -0.1969 -0.2868 0.6700 0.0898 -498.3626 -428.9141 -0.9720 -1.1033
0.0464 0.57 8700 0.0232 -0.2151 -0.2976 0.6720 0.0826 -509.2527 -447.0790 -0.8658 -0.9893
0.0296 0.58 8800 0.0231 -0.1914 -0.2749 0.6730 0.0835 -486.5063 -423.3791 -0.9562 -1.0852
0.0416 0.58 8900 0.0230 -0.2546 -0.3499 0.6720 0.0953 -561.4706 -486.5627 -0.8593 -0.9866
0.0374 0.59 9000 0.0229 -0.1957 -0.2784 0.6695 0.0827 -490.0193 -427.6933 -0.9676 -1.0981
0.026 0.6 9100 0.0231 -0.1901 -0.2688 0.6720 0.0787 -480.4329 -422.1302 -1.0459 -1.1806
0.0247 0.6 9200 0.0236 -0.1171 -0.1918 0.6705 0.0747 -403.3864 -349.0942 -1.1933 -1.3378
0.0193 0.61 9300 0.0231 -0.2085 -0.2946 0.6705 0.0862 -506.2588 -440.4871 -0.9579 -1.0926
0.028 0.62 9400 0.0232 -0.1847 -0.2630 0.6750 0.0783 -474.6612 -416.7447 -0.9186 -1.0483
0.0119 0.62 9500 0.0235 -0.2603 -0.3495 0.6660 0.0892 -561.1232 -492.2703 -0.6150 -0.7300
0.0178 0.63 9600 0.0232 -0.2461 -0.3329 0.6655 0.0868 -544.4890 -478.0711 -0.6486 -0.7644
0.0355 0.63 9700 0.0232 -0.2619 -0.3441 0.6650 0.0822 -555.6818 -493.8837 -0.7045 -0.8232
0.0238 0.64 9800 0.0234 -0.2640 -0.3436 0.6690 0.0797 -555.2313 -495.9717 -0.7577 -0.8786
0.0315 0.65 9900 0.0231 -0.2402 -0.3324 0.6670 0.0922 -543.9803 -472.1986 -0.8464 -0.9754
0.0267 0.65 10000 0.0233 -0.2333 -0.3282 0.6645 0.0949 -539.8396 -465.3473 -0.8768 -1.0084
0.018 0.66 10100 0.0235 -0.1871 -0.2697 0.6665 0.0826 -481.2975 -419.0774 -0.9507 -1.0827
0.0183 0.67 10200 0.0233 -0.2143 -0.3107 0.6660 0.0964 -522.2762 -446.3001 -1.0028 -1.1422
0.0162 0.67 10300 0.0229 -0.1964 -0.2831 0.6675 0.0867 -494.7217 -428.4237 -0.9919 -1.1283
0.0134 0.68 10400 0.0231 -0.2075 -0.2984 0.6660 0.0909 -510.0122 -439.4990 -0.9949 -1.1326
0.0195 0.69 10500 0.0230 -0.2028 -0.2909 0.6665 0.0881 -502.5017 -434.7631 -0.9652 -1.1005
0.0151 0.69 10600 0.0232 -0.2275 -0.3201 0.6685 0.0927 -531.7596 -459.4988 -0.8827 -1.0146
0.0207 0.7 10700 0.0229 -0.2101 -0.2965 0.6745 0.0863 -508.0856 -442.1295 -0.8176 -0.9439
0.0343 0.71 10800 0.0229 -0.1772 -0.2624 0.6725 0.0852 -474.0302 -409.1922 -0.9335 -1.0660
0.0277 0.71 10900 0.0232 -0.1832 -0.2641 0.6715 0.0809 -475.7294 -415.1988 -0.8820 -1.0102
0.0468 0.72 11000 0.0232 -0.1684 -0.2502 0.6710 0.0818 -461.7660 -400.4062 -0.9471 -1.0790
0.0205 0.73 11100 0.0231 -0.1485 -0.2324 0.6715 0.0838 -443.9662 -380.5112 -1.0276 -1.1645
0.0208 0.73 11200 0.0232 -0.1421 -0.2241 0.6665 0.0820 -435.7383 -374.1264 -1.0866 -1.2266
0.0203 0.74 11300 0.0228 -0.1865 -0.2734 0.6695 0.0869 -485.0168 -418.5069 -0.9368 -1.0693
0.0322 0.75 11400 0.0232 -0.1914 -0.2833 0.6705 0.0919 -494.9005 -423.4102 -1.0057 -1.1440
0.0208 0.75 11500 0.0230 -0.1844 -0.2674 0.6710 0.0830 -479.0218 -416.4353 -0.8679 -0.9952
0.0289 0.76 11600 0.0229 -0.2138 -0.3059 0.6670 0.0921 -517.5433 -445.8511 -0.7842 -0.9087
0.0196 0.77 11700 0.0229 -0.2163 -0.3027 0.6690 0.0864 -514.3256 -448.2766 -0.6985 -0.8165
0.0164 0.77 11800 0.0228 -0.2281 -0.3127 0.6700 0.0846 -524.3269 -460.1056 -0.6315 -0.7455
0.0204 0.78 11900 0.0228 -0.2507 -0.3406 0.6695 0.0899 -552.1954 -482.6768 -0.6060 -0.7203
0.0332 0.79 12000 0.0228 -0.2229 -0.3094 0.6685 0.0865 -521.0510 -454.8977 -0.6991 -0.8177
0.0127 0.79 12100 0.0227 -0.2028 -0.2871 0.6675 0.0843 -498.7013 -434.7568 -0.7612 -0.8830
0.0325 0.8 12200 0.0228 -0.1688 -0.2506 0.6710 0.0819 -462.2358 -400.7754 -0.8779 -1.0058
0.0312 0.8 12300 0.0226 -0.1790 -0.2638 0.6690 0.0849 -475.4503 -410.9585 -0.8499 -0.9771
0.0288 0.81 12400 0.0226 -0.1852 -0.2705 0.6705 0.0853 -482.1120 -417.2077 -0.8575 -0.9853
0.0124 0.82 12500 0.0227 -0.1829 -0.2670 0.6700 0.0841 -478.6212 -414.9066 -0.8720 -1.0003
0.0164 0.82 12600 0.0226 -0.1860 -0.2705 0.6740 0.0845 -482.1584 -418.0470 -0.8740 -1.0031
0.0123 0.83 12700 0.0226 -0.1777 -0.2626 0.6725 0.0850 -474.2336 -409.6584 -0.8919 -1.0220
0.0172 0.84 12800 0.0226 -0.1748 -0.2600 0.6720 0.0852 -471.6224 -406.8354 -0.8885 -1.0182
0.0077 0.84 12900 0.0225 -0.1771 -0.2640 0.6735 0.0869 -475.6176 -409.0995 -0.9263 -1.0589
0.0102 0.85 13000 0.0225 -0.1702 -0.2566 0.6725 0.0864 -468.2498 -402.1976 -0.9231 -1.0553
0.0352 0.86 13100 0.0226 -0.1723 -0.2576 0.6735 0.0853 -469.2229 -404.3332 -0.9195 -1.0515
0.017 0.86 13200 0.0225 -0.1818 -0.2697 0.6740 0.0879 -481.2682 -413.8024 -0.8943 -1.0253
0.0207 0.87 13300 0.0225 -0.1720 -0.2583 0.6725 0.0863 -469.9547 -404.0227 -0.9057 -1.0369
0.0315 0.88 13400 0.0225 -0.1693 -0.2546 0.6735 0.0853 -466.2376 -401.3037 -0.9093 -1.0403
0.0148 0.88 13500 0.0225 -0.1702 -0.2556 0.6715 0.0855 -467.2293 -402.1566 -0.9070 -1.0379
0.0191 0.89 13600 0.0225 -0.1710 -0.2578 0.6745 0.0868 -469.4186 -402.9745 -0.9059 -1.0370
0.0221 0.9 13700 0.0224 -0.1684 -0.2544 0.6745 0.0861 -466.0537 -400.3587 -0.9192 -1.0510
0.0299 0.9 13800 0.0224 -0.1708 -0.2578 0.6730 0.0871 -469.4453 -402.7551 -0.9125 -1.0439
0.0219 0.91 13900 0.0224 -0.1743 -0.2623 0.6730 0.0880 -473.8788 -406.2876 -0.9065 -1.0379
0.024 0.92 14000 0.0224 -0.1787 -0.2671 0.6755 0.0885 -478.7598 -410.6616 -0.8850 -1.0154
0.0228 0.92 14100 0.0225 -0.1771 -0.2650 0.6740 0.0879 -476.6039 -409.0930 -0.8919 -1.0223
0.0146 0.93 14200 0.0224 -0.1803 -0.2687 0.6770 0.0884 -480.3093 -412.2579 -0.8844 -1.0147
0.0164 0.94 14300 0.0225 -0.1792 -0.2672 0.6755 0.0880 -478.8005 -411.2285 -0.8855 -1.0157
0.0248 0.94 14400 0.0224 -0.1808 -0.2691 0.6745 0.0883 -480.7047 -412.7735 -0.8846 -1.0148
0.0118 0.95 14500 0.0224 -0.1814 -0.2697 0.6725 0.0884 -481.3487 -413.3884 -0.8831 -1.0131
0.0346 0.96 14600 0.0224 -0.1805 -0.2683 0.6750 0.0879 -479.9362 -412.4734 -0.8849 -1.0152
0.0182 0.96 14700 0.0224 -0.1800 -0.2678 0.6740 0.0877 -479.3696 -412.0334 -0.8840 -1.0140
0.0084 0.97 14800 0.0224 -0.1805 -0.2684 0.6745 0.0878 -480.0011 -412.5492 -0.8846 -1.0147
0.0249 0.97 14900 0.0224 -0.1807 -0.2685 0.6765 0.0879 -480.1522 -412.6696 -0.8850 -1.0151
0.0184 0.98 15000 0.0224 -0.1804 -0.2682 0.6755 0.0878 -479.8432 -412.4375 -0.8854 -1.0154
0.0345 0.99 15100 0.0224 -0.1801 -0.2679 0.6735 0.0877 -479.4683 -412.1548 -0.8840 -1.0139
0.0244 0.99 15200 0.0224 -0.1803 -0.2680 0.6750 0.0877 -479.6048 -412.2724 -0.8862 -1.0163

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
98
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train DUAL-GPO/zephyr-7b-gpo-update3-i0