Edit model card

zephyr-7b-gpo-v0-i1

This model is a fine-tuned version of DUAL-GPO/zephyr-7b-gpo-update3-i0 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1128
  • Rewards/chosen: -0.3200
  • Rewards/rejected: -0.3706
  • Rewards/accuracies: 0.4955
  • Rewards/margins: 0.0506
  • Logps/rejected: -621.5818
  • Logps/chosen: -585.8446
  • Logits/rejected: -1.9142
  • Logits/chosen: -2.0965

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 12
  • total_eval_batch_size: 6
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.3416 0.02 100 0.0447 -0.0994 -0.1161 0.5883 0.0167 -367.1221 -365.3260 -1.7202 -1.8827
0.2571 0.05 200 0.0858 -0.1849 -0.2159 0.4790 0.0310 -466.8627 -450.7509 -1.8599 -2.0364
0.2771 0.07 300 0.0910 -0.2419 -0.2769 0.4775 0.0350 -527.8735 -507.7906 -1.9087 -2.0909
0.2561 0.1 400 0.1127 -0.4661 -0.5086 0.4895 0.0425 -759.5652 -731.9658 -1.9571 -2.1511
0.2604 0.12 500 0.0826 -0.3221 -0.3613 0.4835 0.0393 -612.2919 -587.9281 -1.8643 -2.0449
0.2778 0.14 600 0.1033 -0.2940 -0.3303 0.4760 0.0363 -581.3212 -559.9218 -1.8588 -2.0387
0.2631 0.17 700 0.1084 -0.3587 -0.4024 0.4865 0.0437 -653.3798 -624.5897 -1.8458 -2.0252
0.2264 0.19 800 0.1158 -0.2355 -0.2734 0.4731 0.0378 -524.3303 -501.3899 -1.8726 -2.0501
0.2593 0.22 900 0.1048 -0.2730 -0.3214 0.4865 0.0485 -572.4186 -538.8648 -1.7883 -1.9593
0.2248 0.24 1000 0.1122 -0.2753 -0.3216 0.4760 0.0463 -572.5806 -541.1548 -1.8308 -2.0088
0.2345 0.26 1100 0.1249 -0.2594 -0.2977 0.4581 0.0382 -548.6310 -525.3046 -1.8628 -2.0406
0.2 0.29 1200 0.1212 -0.3796 -0.4250 0.4925 0.0454 -675.9450 -645.4562 -1.8382 -2.0177
0.2246 0.31 1300 0.1102 -0.2548 -0.3030 0.4850 0.0482 -553.9783 -520.6531 -1.9584 -2.1449
0.2481 0.34 1400 0.1082 -0.2988 -0.3545 0.4955 0.0557 -605.4994 -564.6545 -1.8877 -2.0708
0.232 0.36 1500 0.1053 -0.2421 -0.2907 0.4910 0.0486 -541.7161 -508.0170 -1.9404 -2.1256
0.2351 0.38 1600 0.1098 -0.3383 -0.3864 0.4775 0.0481 -637.3510 -604.1564 -1.8506 -2.0290
0.2622 0.41 1700 0.1196 -0.2614 -0.3121 0.4820 0.0507 -563.0452 -527.2568 -1.9197 -2.1016
0.2043 0.43 1800 0.1257 -0.2798 -0.3252 0.4820 0.0454 -576.1965 -545.7018 -1.9177 -2.0980
0.2205 0.46 1900 0.1154 -0.4037 -0.4629 0.4850 0.0592 -713.9170 -669.5957 -1.8198 -1.9972
0.2156 0.48 2000 0.1103 -0.2727 -0.3161 0.4865 0.0434 -567.0794 -538.5911 -1.9234 -2.1044
0.2308 0.5 2100 0.1163 -0.4322 -0.4852 0.4925 0.0531 -736.1898 -698.0287 -1.8013 -1.9761
0.2204 0.53 2200 0.1083 -0.3224 -0.3712 0.4940 0.0488 -622.1750 -588.3229 -1.8487 -2.0260
0.2303 0.55 2300 0.1192 -0.3117 -0.3667 0.4940 0.0551 -617.7075 -577.5367 -1.8679 -2.0473
0.231 0.58 2400 0.1068 -0.3476 -0.4008 0.5 0.0532 -651.7600 -613.4935 -1.8167 -1.9926
0.2252 0.6 2500 0.1240 -0.3568 -0.4154 0.4940 0.0586 -666.3873 -622.7224 -1.9124 -2.0972
0.2445 0.62 2600 0.1240 -0.3426 -0.4003 0.4805 0.0576 -651.2365 -608.5200 -1.9230 -2.1073
0.2212 0.65 2700 0.1103 -0.2894 -0.3362 0.4925 0.0468 -587.1506 -555.2968 -1.9049 -2.0860
0.2301 0.67 2800 0.1073 -0.2754 -0.3278 0.5105 0.0524 -578.7745 -541.2313 -1.9024 -2.0838
0.2099 0.7 2900 0.1191 -0.3108 -0.3657 0.5015 0.0549 -616.7156 -576.6858 -1.9182 -2.1014
0.2072 0.72 3000 0.1120 -0.3062 -0.3563 0.4910 0.0500 -607.2319 -572.1099 -1.9258 -2.1090
0.2186 0.74 3100 0.1155 -0.2960 -0.3474 0.4985 0.0514 -598.4005 -561.9234 -1.9031 -2.0849
0.2743 0.77 3200 0.1121 -0.2815 -0.3314 0.4955 0.0499 -582.3980 -547.4086 -1.9332 -2.1170
0.1989 0.79 3300 0.1116 -0.3235 -0.3744 0.4850 0.0509 -625.3889 -589.4213 -1.8977 -2.0789
0.2258 0.82 3400 0.1093 -0.3091 -0.3603 0.4970 0.0512 -611.2418 -574.9766 -1.9164 -2.0989
0.2524 0.84 3500 0.1142 -0.3383 -0.3897 0.4910 0.0514 -640.6893 -604.2028 -1.9130 -2.0956
0.2202 0.86 3600 0.1173 -0.3412 -0.3925 0.4835 0.0513 -643.4937 -607.1244 -1.9146 -2.0973
0.2365 0.89 3700 0.1178 -0.3273 -0.3787 0.4850 0.0514 -629.6786 -593.2114 -1.9279 -2.1117
0.1894 0.91 3800 0.1152 -0.3184 -0.3694 0.4925 0.0509 -620.3304 -584.3237 -1.9252 -2.1088
0.2372 0.94 3900 0.1130 -0.3155 -0.3658 0.4940 0.0503 -616.7926 -581.3542 -1.9194 -2.1021
0.2029 0.96 4000 0.1133 -0.3208 -0.3715 0.4925 0.0507 -622.4911 -586.6887 -1.9141 -2.0964
0.2438 0.98 4100 0.1129 -0.3199 -0.3707 0.4940 0.0508 -621.6636 -585.7551 -1.9140 -2.0965

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
21
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train DUAL-GPO/zephyr-7b-gpo-v0-i1