Edit model card

zephyr-7b-dpo-lora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5894
  • Rewards/chosen: -0.2738
  • Rewards/rejected: -0.6020
  • Rewards/accuracies: 0.7035
  • Rewards/margins: 0.3282
  • Logps/rejected: -321.6407
  • Logps/chosen: -310.1199
  • Logits/rejected: -2.7529
  • Logits/chosen: -2.7746

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6929 0.0262 100 0.6930 -0.0001 -0.0004 0.5250 0.0003 -261.4788 -282.7496 -2.8388 -2.8661
0.6923 0.0523 200 0.6923 0.0008 -0.0009 0.6050 0.0017 -261.5316 -282.6624 -2.8380 -2.8653
0.6898 0.0785 300 0.6903 0.0035 -0.0024 0.6640 0.0058 -261.6760 -282.3918 -2.8350 -2.8623
0.6872 0.1047 400 0.6862 0.0165 0.0021 0.6670 0.0144 -261.2256 -281.0900 -2.8308 -2.8577
0.6783 0.1309 500 0.6804 0.0209 -0.0059 0.6835 0.0267 -262.0230 -280.6481 -2.8215 -2.8486
0.6729 0.1570 600 0.6733 0.0154 -0.0272 0.6840 0.0426 -264.1608 -281.1958 -2.8138 -2.8410
0.6665 0.1832 700 0.6638 -0.0035 -0.0689 0.6755 0.0654 -268.3266 -283.0863 -2.8060 -2.8327
0.6427 0.2094 800 0.6546 -0.0214 -0.1104 0.6815 0.0889 -272.4747 -284.8825 -2.8020 -2.8283
0.6428 0.2355 900 0.6458 -0.0247 -0.1383 0.6770 0.1136 -275.2685 -285.2050 -2.7942 -2.8199
0.6381 0.2617 1000 0.6358 -0.0638 -0.2074 0.6785 0.1436 -282.1761 -289.1206 -2.7887 -2.8138
0.6488 0.2879 1100 0.6284 -0.1378 -0.3055 0.6790 0.1677 -291.9890 -296.5138 -2.7826 -2.8071
0.6427 0.3141 1200 0.6223 -0.1104 -0.2986 0.6835 0.1882 -291.3028 -293.7785 -2.7931 -2.8165
0.6131 0.3402 1300 0.6172 -0.1466 -0.3514 0.6865 0.2049 -296.5806 -297.3945 -2.7951 -2.8180
0.6326 0.3664 1400 0.6155 -0.1752 -0.3896 0.6860 0.2144 -300.3966 -300.2597 -2.7920 -2.8147
0.6128 0.3926 1500 0.6180 -0.0630 -0.2687 0.6890 0.2057 -288.3090 -289.0369 -2.7980 -2.8198
0.6223 0.4187 1600 0.6088 -0.1688 -0.4097 0.6945 0.2409 -302.4074 -299.6220 -2.7926 -2.8148
0.6338 0.4449 1700 0.6061 -0.2152 -0.4665 0.6925 0.2513 -308.0869 -304.2535 -2.7961 -2.8181
0.585 0.4711 1800 0.6050 -0.1327 -0.3850 0.6915 0.2523 -299.9368 -296.0054 -2.7949 -2.8174
0.577 0.4973 1900 0.6013 -0.2170 -0.4883 0.6965 0.2713 -310.2670 -304.4333 -2.7954 -2.8176
0.5945 0.5234 2000 0.5992 -0.2107 -0.4899 0.6995 0.2793 -310.4293 -303.8028 -2.7903 -2.8122
0.5913 0.5496 2100 0.5981 -0.2373 -0.5251 0.7025 0.2879 -313.9529 -306.4641 -2.7863 -2.8085
0.5816 0.5758 2200 0.5989 -0.2688 -0.5570 0.6970 0.2883 -317.1411 -309.6146 -2.7849 -2.8070
0.5824 0.6019 2300 0.5961 -0.2227 -0.5189 0.6955 0.2961 -313.3233 -305.0098 -2.7821 -2.8037
0.602 0.6281 2400 0.5969 -0.2683 -0.5669 0.6990 0.2986 -318.1251 -309.5652 -2.7744 -2.7961
0.5792 0.6543 2500 0.5963 -0.2102 -0.5041 0.6975 0.2938 -311.8429 -303.7615 -2.7763 -2.7980
0.6028 0.6805 2600 0.5974 -0.1896 -0.4790 0.6920 0.2895 -309.3417 -301.6964 -2.7717 -2.7933
0.5854 0.7066 2700 0.5930 -0.2517 -0.5615 0.7020 0.3098 -317.5864 -307.9027 -2.7676 -2.7892
0.5994 0.7328 2800 0.5920 -0.2607 -0.5775 0.7045 0.3167 -319.1838 -308.8107 -2.7636 -2.7851
0.5837 0.7590 2900 0.5913 -0.2540 -0.5721 0.7055 0.3181 -318.6511 -308.1379 -2.7619 -2.7834
0.5858 0.7851 3000 0.5910 -0.2625 -0.5835 0.7055 0.3210 -319.7853 -308.9898 -2.7605 -2.7819
0.5685 0.8113 3100 0.5914 -0.2383 -0.5571 0.7040 0.3188 -317.1507 -306.5707 -2.7558 -2.7777
0.5753 0.8375 3200 0.5903 -0.2623 -0.5868 0.7020 0.3246 -320.1224 -308.9666 -2.7567 -2.7783
0.5769 0.8636 3300 0.5900 -0.2673 -0.5934 0.7030 0.3260 -320.7757 -309.4716 -2.7555 -2.7771
0.5608 0.8898 3400 0.5896 -0.2716 -0.5988 0.7020 0.3273 -321.3196 -309.8930 -2.7520 -2.7739
0.6008 0.9160 3500 0.5895 -0.2716 -0.5994 0.7035 0.3277 -321.3745 -309.9000 -2.7539 -2.7755
0.585 0.9422 3600 0.5895 -0.2722 -0.6000 0.7020 0.3279 -321.4418 -309.9531 -2.7549 -2.7764
0.567 0.9683 3700 0.5893 -0.2738 -0.6022 0.7015 0.3284 -321.6555 -310.1171 -2.7539 -2.7755
0.5834 0.9945 3800 0.5893 -0.2740 -0.6023 0.7025 0.3283 -321.6666 -310.1333 -2.7525 -2.7742

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.0
  • Pytorch 2.2.0
  • Datasets 2.16.1
  • Tokenizers 0.19.1
Downloads last month
14
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train SeniorKabanocci/zephyr-7b-dpo-lora