Edit model card

zephyr-7b-gpo-log1-i0

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6897
  • Rewards/chosen: 0.0141
  • Rewards/rejected: -0.0702
  • Rewards/accuracies: 0.6370
  • Rewards/margins: 0.0842
  • Logps/rejected: -218.6293
  • Logps/chosen: -230.5992
  • Logits/rejected: -2.1363
  • Logits/chosen: -2.3248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6932 0.01 100 0.6931 0.0024 0.0017 0.4950 0.0007 -211.4439 -231.7646 -2.1604 -2.3488
0.6927 0.01 200 0.6928 0.0053 -0.0004 0.5835 0.0057 -211.6526 -231.4798 -2.1608 -2.3492
0.6917 0.02 300 0.6925 0.0335 0.0177 0.5830 0.0159 -209.8460 -228.6509 -2.1649 -2.3535
0.6916 0.03 400 0.6920 0.0466 0.0223 0.6020 0.0244 -209.3866 -227.3408 -2.1660 -2.3548
0.6917 0.03 500 0.6916 0.0638 0.0219 0.6060 0.0419 -209.4261 -225.6272 -2.1616 -2.3499
0.6919 0.04 600 0.6913 0.0498 0.0026 0.5970 0.0472 -211.3561 -227.0246 -2.1675 -2.3568
0.6909 0.05 700 0.6913 0.0561 0.0106 0.6145 0.0455 -210.5544 -226.3928 -2.1615 -2.3501
0.6913 0.05 800 0.6913 -0.1047 -0.1559 0.5970 0.0512 -227.2016 -242.4708 -2.1428 -2.3307
0.6921 0.06 900 0.6909 -0.0526 -0.1012 0.6060 0.0486 -221.7336 -237.2677 -2.1466 -2.3343
0.6903 0.07 1000 0.6908 -0.0008 -0.0563 0.6185 0.0555 -217.2371 -232.0825 -2.1575 -2.3453
0.6922 0.07 1100 0.6911 -0.0015 -0.0779 0.6275 0.0764 -219.4024 -232.1565 -2.1294 -2.3151
0.6906 0.08 1200 0.6907 -0.0276 -0.0979 0.6375 0.0703 -221.4021 -234.7645 -2.1398 -2.3272
0.6886 0.09 1300 0.6907 0.0146 -0.0428 0.6105 0.0574 -215.8946 -230.5475 -2.1613 -2.3501
0.6887 0.09 1400 0.6909 0.0072 -0.0587 0.6130 0.0660 -217.4851 -231.2815 -2.1350 -2.3205
0.6887 0.1 1500 0.6907 -0.0114 -0.0845 0.6305 0.0731 -220.0597 -233.1405 -2.1365 -2.3217
0.6904 0.1 1600 0.6906 0.0443 -0.0289 0.6260 0.0732 -214.5052 -227.5776 -2.1414 -2.3270
0.6893 0.11 1700 0.6904 0.0333 -0.0409 0.6215 0.0742 -215.7022 -228.6733 -2.1548 -2.3421
0.6904 0.12 1800 0.6909 0.0409 -0.0143 0.6160 0.0552 -213.0369 -227.9110 -2.1467 -2.3331
0.6908 0.12 1900 0.6906 0.0455 -0.0171 0.6290 0.0626 -213.3265 -227.4577 -2.1587 -2.3461
0.6907 0.13 2000 0.6904 -0.0093 -0.0898 0.6400 0.0805 -220.5949 -232.9343 -2.1672 -2.3558
0.6904 0.14 2100 0.6905 0.0245 -0.0431 0.6380 0.0676 -215.9218 -229.5578 -2.1837 -2.3738
0.6916 0.14 2200 0.6904 -0.0211 -0.1023 0.6260 0.0812 -221.8438 -234.1163 -2.1669 -2.3566
0.6913 0.15 2300 0.6907 -0.0303 -0.1156 0.6170 0.0852 -223.1697 -235.0393 -2.1698 -2.3594
0.6899 0.16 2400 0.6904 0.0312 -0.0385 0.6225 0.0697 -215.4613 -228.8855 -2.1472 -2.3345
0.6924 0.16 2500 0.6905 0.0577 -0.0074 0.625 0.0651 -212.3521 -226.2342 -2.1658 -2.3546
0.6893 0.17 2600 0.6903 0.0520 -0.0205 0.6320 0.0725 -213.6627 -226.8027 -2.1570 -2.3453
0.6901 0.18 2700 0.6906 0.0038 -0.0622 0.6325 0.0660 -217.8366 -231.6274 -2.1382 -2.3249
0.6909 0.18 2800 0.6903 0.0333 -0.0363 0.6315 0.0696 -215.2451 -228.6795 -2.1165 -2.3020
0.6893 0.19 2900 0.6902 0.0110 -0.0612 0.6380 0.0722 -217.7327 -230.9010 -2.1110 -2.2960
0.6925 0.2 3000 0.6903 0.0154 -0.0656 0.6245 0.0811 -218.1745 -230.4610 -2.1312 -2.3182
0.692 0.2 3100 0.6903 -0.0346 -0.1194 0.6440 0.0849 -223.5567 -235.4630 -2.1298 -2.3160
0.687 0.21 3200 0.6903 -0.0146 -0.0904 0.6210 0.0757 -220.6501 -233.4682 -2.1344 -2.3212
0.6908 0.22 3300 0.6902 -0.0061 -0.0903 0.6420 0.0842 -220.6434 -232.6119 -2.1233 -2.3094
0.6908 0.22 3400 0.6904 -0.0103 -0.0884 0.6345 0.0781 -220.4491 -233.0300 -2.1210 -2.3068
0.6901 0.23 3500 0.6903 0.0193 -0.0626 0.6355 0.0819 -217.8700 -230.0756 -2.1193 -2.3047
0.6913 0.24 3600 0.6902 0.0148 -0.0690 0.6360 0.0838 -218.5164 -230.5288 -2.1189 -2.3041
0.694 0.24 3700 0.6904 -0.0287 -0.1025 0.6390 0.0738 -221.8667 -234.8788 -2.0983 -2.2820
0.6891 0.25 3800 0.6902 0.0450 -0.0237 0.6320 0.0687 -213.9806 -227.5013 -2.0923 -2.2758
0.6877 0.26 3900 0.6902 0.0220 -0.0570 0.6245 0.0791 -217.3152 -229.8009 -2.1089 -2.2936
0.6884 0.26 4000 0.6901 -0.0013 -0.0808 0.6360 0.0795 -219.6905 -232.1315 -2.1064 -2.2913
0.693 0.27 4100 0.6904 -0.0133 -0.0759 0.6280 0.0626 -219.1985 -233.3333 -2.1177 -2.3035
0.691 0.27 4200 0.6904 -0.0025 -0.0715 0.6360 0.0690 -218.7613 -232.2541 -2.1112 -2.2963
0.6904 0.28 4300 0.6901 -0.0338 -0.1195 0.6345 0.0858 -223.5635 -235.3810 -2.1015 -2.2866
0.6903 0.29 4400 0.6902 -0.0454 -0.1194 0.6275 0.0740 -223.5494 -236.5452 -2.1077 -2.2929
0.6864 0.29 4500 0.6901 -0.0231 -0.1063 0.6325 0.0833 -222.2449 -234.3118 -2.1211 -2.3074
0.6904 0.3 4600 0.6902 0.0062 -0.0640 0.6310 0.0702 -218.0117 -231.3809 -2.1215 -2.3078
0.6854 0.31 4700 0.6903 -0.0355 -0.1276 0.6355 0.0921 -224.3721 -235.5581 -2.1311 -2.3193
0.6918 0.31 4800 0.6902 -0.0179 -0.0916 0.6385 0.0737 -220.7675 -233.7953 -2.1200 -2.3064
0.6886 0.32 4900 0.6902 -0.0208 -0.1097 0.6425 0.0889 -222.5813 -234.0859 -2.0991 -2.2843
0.6923 0.33 5000 0.6901 -0.0066 -0.0881 0.6270 0.0815 -220.4222 -232.6694 -2.1010 -2.2864
0.6914 0.33 5100 0.6902 -0.0049 -0.0898 0.6365 0.0849 -220.5913 -232.4988 -2.1187 -2.3049
0.6895 0.34 5200 0.6902 -0.0224 -0.0983 0.6295 0.0759 -221.4422 -234.2488 -2.1360 -2.3237
0.6928 0.35 5300 0.6903 -0.0338 -0.1157 0.6300 0.0819 -223.1770 -235.3836 -2.1243 -2.3110
0.689 0.35 5400 0.6902 0.0233 -0.0513 0.6335 0.0746 -216.7387 -229.6749 -2.1113 -2.2966
0.6884 0.36 5500 0.6904 -0.0049 -0.0776 0.6230 0.0727 -219.3675 -232.4934 -2.1054 -2.2905
0.6901 0.37 5600 0.6903 -0.0024 -0.0762 0.6340 0.0738 -219.2327 -232.2428 -2.1021 -2.2871
0.6906 0.37 5700 0.6901 0.0148 -0.0702 0.6345 0.0849 -218.6294 -230.5282 -2.0973 -2.2823
0.69 0.38 5800 0.6902 -0.0196 -0.1110 0.6365 0.0914 -222.7126 -233.9667 -2.1048 -2.2907
0.6907 0.39 5900 0.6901 0.0021 -0.0814 0.6385 0.0835 -219.7548 -231.7942 -2.0946 -2.2797
0.6901 0.39 6000 0.6901 0.0056 -0.0656 0.6295 0.0713 -218.1741 -231.4416 -2.1236 -2.3110
0.6889 0.4 6100 0.6901 0.0339 -0.0376 0.6215 0.0716 -215.3745 -228.6116 -2.1316 -2.3196
0.691 0.41 6200 0.6900 0.0231 -0.0575 0.6285 0.0806 -217.3578 -229.6931 -2.1264 -2.3146
0.6871 0.41 6300 0.6900 0.0432 -0.0379 0.6370 0.0810 -215.3970 -227.6890 -2.1200 -2.3069
0.6892 0.42 6400 0.6901 0.0295 -0.0619 0.6310 0.0914 -217.7995 -229.0562 -2.1320 -2.3205
0.6918 0.43 6500 0.6901 0.0240 -0.0559 0.6370 0.0799 -217.2022 -229.6073 -2.1407 -2.3293
0.6899 0.43 6600 0.6901 0.0346 -0.0427 0.6355 0.0773 -215.8845 -228.5490 -2.1480 -2.3373
0.6914 0.44 6700 0.6901 0.0006 -0.0896 0.6385 0.0902 -220.5701 -231.9431 -2.1399 -2.3289
0.6921 0.44 6800 0.6900 -0.0122 -0.0949 0.6345 0.0826 -221.0977 -233.2272 -2.1373 -2.3262
0.6881 0.45 6900 0.6900 0.0001 -0.0807 0.6310 0.0808 -219.6810 -231.9954 -2.1336 -2.3221
0.688 0.46 7000 0.6900 -0.0035 -0.0895 0.6255 0.0860 -220.5654 -232.3555 -2.1330 -2.3214
0.6893 0.46 7100 0.6900 0.0038 -0.0786 0.6310 0.0824 -219.4742 -231.6270 -2.1255 -2.3129
0.6888 0.47 7200 0.6900 0.0146 -0.0599 0.6220 0.0745 -217.6021 -230.5473 -2.1376 -2.3262
0.6907 0.48 7300 0.6899 -0.0074 -0.0859 0.6290 0.0785 -220.2062 -232.7456 -2.1270 -2.3148
0.6931 0.48 7400 0.6900 0.0088 -0.0681 0.6285 0.0770 -218.4249 -231.1209 -2.1238 -2.3113
0.6895 0.49 7500 0.6899 0.0001 -0.0788 0.6280 0.0789 -219.4958 -231.9997 -2.1007 -2.2861
0.6874 0.5 7600 0.6900 -0.0044 -0.0909 0.6300 0.0865 -220.7033 -232.4485 -2.1033 -2.2888
0.6898 0.5 7700 0.6899 0.0018 -0.0817 0.6355 0.0835 -219.7780 -231.8252 -2.0977 -2.2827
0.6885 0.51 7800 0.6900 -0.0331 -0.1186 0.6485 0.0855 -223.4754 -235.3170 -2.0865 -2.2713
0.6905 0.52 7900 0.6899 -0.0476 -0.1257 0.6425 0.0781 -224.1827 -236.7635 -2.0852 -2.2699
0.6911 0.52 8000 0.6899 -0.0329 -0.1140 0.6345 0.0811 -223.0114 -235.2987 -2.0814 -2.2658
0.6915 0.53 8100 0.6899 -0.0158 -0.0964 0.6365 0.0807 -221.2535 -233.5811 -2.0877 -2.2729
0.6907 0.54 8200 0.6899 -0.0250 -0.1063 0.6355 0.0814 -222.2466 -234.5026 -2.0843 -2.2691
0.6893 0.54 8300 0.6900 -0.0020 -0.0780 0.6345 0.0760 -219.4079 -232.2015 -2.0923 -2.2778
0.6904 0.55 8400 0.6900 0.0123 -0.0553 0.6295 0.0676 -217.1386 -230.7717 -2.0953 -2.2805
0.6885 0.56 8500 0.6898 0.0006 -0.0852 0.6455 0.0858 -220.1317 -231.9455 -2.0963 -2.2819
0.6889 0.56 8600 0.6898 -0.0030 -0.0879 0.6410 0.0849 -220.4034 -232.3074 -2.1033 -2.2895
0.6895 0.57 8700 0.6898 0.0116 -0.0737 0.6430 0.0853 -218.9868 -230.8494 -2.1105 -2.2970
0.6913 0.58 8800 0.6898 0.0296 -0.0519 0.6465 0.0816 -216.8063 -229.0427 -2.1172 -2.3044
0.6906 0.58 8900 0.6898 0.0039 -0.0875 0.6485 0.0914 -220.3614 -231.6156 -2.1173 -2.3050
0.6888 0.59 9000 0.6898 0.0111 -0.0739 0.6400 0.0851 -219.0050 -230.8923 -2.1196 -2.3073
0.6905 0.6 9100 0.6899 0.0201 -0.0529 0.6325 0.0730 -216.9018 -229.9912 -2.1251 -2.3129
0.6887 0.6 9200 0.6898 0.0207 -0.0583 0.6355 0.0790 -217.4442 -229.9347 -2.1397 -2.3283
0.6899 0.61 9300 0.6898 0.0062 -0.0796 0.6375 0.0858 -219.5693 -231.3830 -2.1441 -2.3333
0.6884 0.62 9400 0.6899 -0.0285 -0.1089 0.6335 0.0804 -222.5007 -234.8580 -2.1432 -2.3321
0.6871 0.62 9500 0.6898 -0.0095 -0.0917 0.6365 0.0822 -220.7840 -232.9599 -2.1435 -2.3324
0.6905 0.63 9600 0.6899 0.0203 -0.0661 0.6385 0.0864 -218.2251 -229.9762 -2.1520 -2.3417
0.6895 0.63 9700 0.6898 0.0048 -0.0783 0.6440 0.0831 -219.4395 -231.5201 -2.1527 -2.3423
0.6915 0.64 9800 0.6898 -0.0028 -0.0828 0.6420 0.0800 -219.8873 -232.2814 -2.1416 -2.3302
0.6894 0.65 9900 0.6898 -0.0006 -0.0874 0.6435 0.0867 -220.3488 -232.0690 -2.1391 -2.3274
0.6897 0.65 10000 0.6899 -0.0191 -0.1066 0.6475 0.0875 -222.2716 -233.9115 -2.1345 -2.3227
0.6859 0.66 10100 0.6899 -0.0225 -0.1068 0.6475 0.0843 -222.2938 -234.2563 -2.1291 -2.3167
0.6904 0.67 10200 0.6898 0.0002 -0.0901 0.6475 0.0903 -220.6184 -231.9806 -2.1274 -2.3151
0.6876 0.67 10300 0.6898 0.0014 -0.0829 0.6435 0.0843 -219.8981 -231.8635 -2.1301 -2.3181
0.6888 0.68 10400 0.6898 0.0178 -0.0690 0.6385 0.0868 -218.5098 -230.2225 -2.1290 -2.3170
0.6893 0.69 10500 0.6898 0.0209 -0.0629 0.6395 0.0838 -217.9021 -229.9178 -2.1322 -2.3205
0.6893 0.69 10600 0.6898 0.0157 -0.0686 0.6430 0.0844 -218.4735 -230.4310 -2.1292 -2.3171
0.6907 0.7 10700 0.6898 0.0165 -0.0682 0.6430 0.0847 -218.4280 -230.3552 -2.1293 -2.3170
0.6877 0.71 10800 0.6898 0.0264 -0.0554 0.6435 0.0818 -217.1490 -229.3606 -2.1293 -2.3171
0.6924 0.71 10900 0.6898 0.0120 -0.0670 0.6385 0.0790 -218.3147 -230.8059 -2.1238 -2.3111
0.691 0.72 11000 0.6898 0.0266 -0.0537 0.6395 0.0803 -216.9807 -229.3445 -2.1251 -2.3125
0.6903 0.73 11100 0.6898 0.0312 -0.0491 0.6360 0.0803 -216.5214 -228.8819 -2.1258 -2.3132
0.6918 0.73 11200 0.6898 0.0305 -0.0499 0.6375 0.0804 -216.6021 -228.9509 -2.1260 -2.3134
0.6879 0.74 11300 0.6898 0.0205 -0.0612 0.6380 0.0818 -217.7365 -229.9544 -2.1278 -2.3155
0.6896 0.75 11400 0.6898 0.0170 -0.0694 0.6355 0.0864 -218.5536 -230.3058 -2.1292 -2.3172
0.6904 0.75 11500 0.6898 0.0200 -0.0610 0.6295 0.0811 -217.7165 -230.0003 -2.1303 -2.3183
0.6891 0.76 11600 0.6898 0.0093 -0.0783 0.6370 0.0877 -219.4468 -231.0702 -2.1269 -2.3147
0.6883 0.77 11700 0.6898 0.0024 -0.0805 0.6355 0.0828 -219.6586 -231.7671 -2.1296 -2.3175
0.69 0.77 11800 0.6898 -0.0053 -0.0871 0.6410 0.0818 -220.3198 -232.5302 -2.1311 -2.3192
0.6871 0.78 11900 0.6898 -0.0076 -0.0914 0.6410 0.0838 -220.7492 -232.7632 -2.1300 -2.3180
0.6887 0.79 12000 0.6898 -0.0020 -0.0869 0.6420 0.0849 -220.3020 -232.2003 -2.1329 -2.3212
0.6881 0.79 12100 0.6898 0.0007 -0.0815 0.6385 0.0822 -219.7614 -231.9368 -2.1346 -2.3230
0.6905 0.8 12200 0.6898 0.0116 -0.0698 0.6340 0.0814 -218.5900 -230.8437 -2.1335 -2.3217
0.6915 0.8 12300 0.6898 0.0068 -0.0793 0.6365 0.0861 -219.5374 -231.3238 -2.1342 -2.3226
0.6927 0.81 12400 0.6898 0.0117 -0.0703 0.6350 0.0820 -218.6442 -230.8355 -2.1361 -2.3246
0.6897 0.82 12500 0.6898 0.0095 -0.0713 0.6325 0.0807 -218.7409 -231.0591 -2.1371 -2.3257
0.6905 0.82 12600 0.6898 0.0061 -0.0744 0.6365 0.0805 -219.0518 -231.3977 -2.1376 -2.3263
0.6905 0.83 12700 0.6898 0.0062 -0.0754 0.6335 0.0815 -219.1471 -231.3857 -2.1376 -2.3263
0.6907 0.84 12800 0.6898 0.0129 -0.0688 0.6360 0.0817 -218.4943 -230.7170 -2.1390 -2.3279
0.6911 0.84 12900 0.6897 0.0182 -0.0653 0.6335 0.0835 -218.1457 -230.1887 -2.1372 -2.3259
0.6886 0.85 13000 0.6897 0.0149 -0.0707 0.6365 0.0856 -218.6831 -230.5150 -2.1390 -2.3278
0.6914 0.86 13100 0.6897 0.0135 -0.0701 0.6355 0.0836 -218.6235 -230.6533 -2.1373 -2.3260
0.6887 0.86 13200 0.6897 0.0112 -0.0734 0.6370 0.0846 -218.9507 -230.8813 -2.1367 -2.3253
0.6891 0.87 13300 0.6897 0.0125 -0.0733 0.6405 0.0858 -218.9421 -230.7573 -2.1360 -2.3246
0.6913 0.88 13400 0.6897 0.0152 -0.0698 0.6305 0.0850 -218.5887 -230.4858 -2.1379 -2.3267
0.6912 0.88 13500 0.6897 0.0194 -0.0641 0.6360 0.0836 -218.0252 -230.0619 -2.1378 -2.3265
0.6905 0.89 13600 0.6897 0.0163 -0.0690 0.6380 0.0853 -218.5100 -230.3711 -2.1382 -2.3269
0.6913 0.9 13700 0.6897 0.0172 -0.0673 0.6360 0.0846 -218.3449 -230.2803 -2.1379 -2.3266
0.69 0.9 13800 0.6897 0.0175 -0.0677 0.6390 0.0851 -218.3797 -230.2597 -2.1379 -2.3266
0.6902 0.91 13900 0.6897 0.0181 -0.0668 0.6400 0.0849 -218.2959 -230.1951 -2.1371 -2.3257
0.6883 0.92 14000 0.6897 0.0142 -0.0709 0.6380 0.0851 -218.7007 -230.5817 -2.1376 -2.3262
0.6898 0.92 14100 0.6897 0.0158 -0.0685 0.6375 0.0844 -218.4662 -230.4218 -2.1366 -2.3252
0.6894 0.93 14200 0.6897 0.0149 -0.0698 0.6375 0.0847 -218.5941 -230.5171 -2.1369 -2.3255
0.6912 0.94 14300 0.6897 0.0145 -0.0702 0.6400 0.0847 -218.6314 -230.5508 -2.1365 -2.3251
0.6893 0.94 14400 0.6897 0.0139 -0.0710 0.6410 0.0848 -218.7085 -230.6183 -2.1361 -2.3247
0.6914 0.95 14500 0.6897 0.0139 -0.0710 0.6370 0.0848 -218.7070 -230.6179 -2.1364 -2.3250
0.6897 0.96 14600 0.6897 0.0138 -0.0707 0.6355 0.0844 -218.6777 -230.6268 -2.1363 -2.3249
0.691 0.96 14700 0.6897 0.0138 -0.0705 0.6365 0.0843 -218.6600 -230.6252 -2.1362 -2.3248
0.6897 0.97 14800 0.6897 0.0139 -0.0705 0.6340 0.0844 -218.6653 -230.6136 -2.1364 -2.3250
0.6892 0.97 14900 0.6897 0.0138 -0.0703 0.6380 0.0841 -218.6449 -230.6241 -2.1365 -2.3250
0.6925 0.98 15000 0.6897 0.0142 -0.0701 0.6385 0.0843 -218.6228 -230.5896 -2.1369 -2.3255
0.6882 0.99 15100 0.6897 0.0141 -0.0701 0.6390 0.0843 -218.6257 -230.5937 -2.1369 -2.3255
0.6896 0.99 15200 0.6897 0.0141 -0.0701 0.6365 0.0842 -218.6245 -230.5999 -2.1366 -2.3251

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
54
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train DUAL-GPO-2/zephyr-7b-gpo-log-i0