Edit model card

zephyr-dpo-qlora-uf-5e-6

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4890
  • Rewards/chosen: -2.8977
  • Rewards/rejected: -4.0719
  • Rewards/accuracies: 0.7798
  • Rewards/margins: 1.1742
  • Rewards/margins Max: 3.6864
  • Rewards/margins Min: -0.9274
  • Rewards/margins Std: 1.5325
  • Logps/rejected: -669.3330
  • Logps/chosen: -574.2586
  • Logits/rejected: -1.7368
  • Logits/chosen: -1.7961

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6893 0.03 100 0.6897 0.0026 -0.0055 0.7202 0.0082 0.0362 -0.0170 0.0176 -262.6957 -284.2244 -2.7822 -2.8200
0.6681 0.05 200 0.6689 0.0162 -0.0429 0.7222 0.0591 0.2404 -0.1128 0.1163 -266.4325 -282.8687 -2.7520 -2.7906
0.64 0.08 300 0.6293 -0.3380 -0.5276 0.7044 0.1896 0.7935 -0.3661 0.3880 -314.9071 -318.2889 -2.7294 -2.7644
0.6335 0.1 400 0.6076 -0.3780 -0.6803 0.7143 0.3023 1.2436 -0.5587 0.5973 -330.1778 -322.2904 -2.7035 -2.7413
0.5664 0.13 500 0.5693 -1.0517 -1.6202 0.7222 0.5685 2.1499 -0.8056 0.9738 -424.1662 -389.6617 -2.3570 -2.3930
0.5428 0.16 600 0.5504 -1.1351 -1.8251 0.7460 0.6900 2.5221 -0.8419 1.1085 -444.6526 -397.9947 -2.3087 -2.3340
0.5696 0.18 700 0.5407 -1.6072 -2.2945 0.7302 0.6873 2.3968 -0.8008 1.0591 -491.5914 -445.2077 -2.0233 -2.0544
0.4864 0.21 800 0.5377 -1.4823 -2.3816 0.7381 0.8993 2.9869 -0.9704 1.3291 -500.2979 -432.7151 -2.1126 -2.1435
0.542 0.24 900 0.5399 -1.9887 -2.8948 0.7302 0.9061 3.1667 -0.9490 1.3690 -551.6262 -483.3614 -2.1744 -2.2024
0.5518 0.26 1000 0.5300 -1.9427 -2.8559 0.7540 0.9131 3.1137 -0.9029 1.3265 -547.7310 -478.7619 -2.1380 -2.1708
0.5538 0.29 1100 0.5361 -1.1129 -1.9809 0.7520 0.8681 3.0506 -0.8555 1.2919 -460.2347 -395.7733 -2.1859 -2.2234
0.5482 0.31 1200 0.5345 -1.2650 -2.1623 0.7798 0.8973 3.0598 -0.8739 1.2932 -478.3762 -410.9884 -2.0283 -2.0696
0.5325 0.34 1300 0.5237 -1.3489 -2.2549 0.7540 0.9060 2.9285 -0.9000 1.2688 -487.6328 -419.3813 -2.0319 -2.0646
0.5647 0.37 1400 0.5171 -1.8056 -2.7729 0.7738 0.9673 3.0310 -0.9191 1.3055 -539.4321 -465.0507 -2.0499 -2.0808
0.5458 0.39 1500 0.5139 -1.4005 -2.3080 0.7659 0.9074 2.8815 -0.9358 1.2687 -492.9399 -424.5414 -2.1490 -2.1788
0.4935 0.42 1600 0.5159 -1.4135 -2.4191 0.7619 1.0056 3.1947 -0.8547 1.3594 -504.0516 -425.8337 -2.0721 -2.1058
0.4832 0.44 1700 0.5182 -1.5594 -2.6076 0.7579 1.0482 3.3861 -0.8998 1.4429 -522.9042 -440.4306 -2.1434 -2.1797
0.5158 0.47 1800 0.5181 -1.7427 -2.8825 0.7639 1.1398 3.5508 -0.9741 1.5177 -550.3890 -458.7530 -1.9600 -2.0015
0.451 0.5 1900 0.5090 -1.5156 -2.5725 0.7579 1.0569 3.3790 -0.8482 1.4174 -519.3948 -436.0498 -1.8888 -1.9342
0.4879 0.52 2000 0.5003 -1.8435 -2.8625 0.7718 1.0190 3.2173 -0.9040 1.3683 -548.3914 -468.8387 -1.8468 -1.8969
0.4879 0.55 2100 0.5044 -1.6709 -2.7719 0.7579 1.1010 3.5672 -0.8763 1.4852 -539.3310 -451.5732 -1.9027 -1.9476
0.4949 0.58 2200 0.4964 -3.2082 -4.4391 0.7778 1.2309 3.8910 -1.0365 1.6390 -706.0513 -605.3098 -1.7221 -1.7794
0.5796 0.6 2300 0.4990 -2.6972 -3.7097 0.7897 1.0125 3.2200 -0.8781 1.3552 -633.1115 -554.2051 -1.7896 -1.8422
0.5492 0.63 2400 0.4969 -3.4670 -4.5017 0.7778 1.0347 3.3130 -0.9050 1.3962 -712.3122 -631.1838 -1.6170 -1.6768
0.4667 0.65 2500 0.5004 -3.5869 -4.8937 0.7817 1.3068 4.1402 -1.0666 1.7418 -751.5126 -643.1785 -1.5865 -1.6490
0.5777 0.68 2600 0.4974 -2.4014 -3.5339 0.7619 1.1325 3.5063 -0.9035 1.4860 -615.5330 -524.6262 -1.7399 -1.7949
0.5021 0.71 2700 0.4927 -2.6594 -3.8176 0.7798 1.1583 3.6119 -0.9273 1.5118 -643.9045 -550.4240 -1.7427 -1.7988
0.5332 0.73 2800 0.4905 -3.2417 -4.4343 0.7817 1.1926 3.7159 -0.9639 1.5556 -705.5735 -608.6549 -1.6555 -1.7144
0.5514 0.76 2900 0.4934 -3.7499 -5.0405 0.7798 1.2906 3.9723 -1.0907 1.6887 -766.1927 -659.4749 -1.6687 -1.7302
0.4162 0.79 3000 0.4917 -3.2815 -4.4510 0.7698 1.1694 3.6486 -0.9447 1.5323 -707.2395 -612.6413 -1.6605 -1.7208
0.5252 0.81 3100 0.4897 -3.1223 -4.3214 0.7857 1.1991 3.7431 -0.9577 1.5632 -694.2787 -596.7130 -1.6937 -1.7536
0.4626 0.84 3200 0.4892 -3.0544 -4.1957 0.7798 1.1413 3.5819 -0.9046 1.4895 -681.7123 -589.9283 -1.7159 -1.7744
0.5186 0.86 3300 0.4896 -2.9688 -4.1127 0.7738 1.1440 3.5867 -0.9061 1.4963 -673.4175 -581.3629 -1.7207 -1.7796
0.4699 0.89 3400 0.4892 -2.8679 -4.0085 0.7758 1.1406 3.5840 -0.8920 1.4895 -662.9918 -571.2766 -1.7332 -1.7916
0.4332 0.92 3500 0.4890 -2.8539 -4.0222 0.7817 1.1684 3.6683 -0.9166 1.5238 -664.3640 -569.8725 -1.7403 -1.7991
0.5292 0.94 3600 0.4888 -2.9244 -4.1012 0.7758 1.1768 3.6946 -0.9285 1.5356 -672.2607 -576.9283 -1.7327 -1.7920
0.5462 0.97 3700 0.4889 -2.8929 -4.0659 0.7758 1.1730 3.6816 -0.9250 1.5309 -668.7320 -573.7759 -1.7393 -1.7981
0.4859 0.99 3800 0.4889 -2.8993 -4.0739 0.7778 1.1746 3.6856 -0.9285 1.5334 -669.5308 -574.4193 -1.7408 -1.7997

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-dpo-qlora-uf-5e-6

Adapter
(137)
this model

Dataset used to train just1nseo/zephyr-dpo-qlora-uf-5e-6