Edit model card

zephyr-dpop-qlora-uf-5e-7-real

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6811
  • Positive Losses: 0.1281
  • Dpo Losses: 0.6626
  • Rewards/chosen: 0.1641
  • Rewards/rejected: 0.0983
  • Rewards/accuracies: 0.7262
  • Rewards/margins: 0.0657
  • Rewards/margins Max: 0.2505
  • Rewards/margins Min: -0.1023
  • Rewards/margins Std: 0.1168
  • Logps/rejected: -252.3152
  • Logps/chosen: -268.0898
  • Logits/rejected: -2.7475
  • Logits/chosen: -2.7856

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6943 0.03 100 0.6936 0.0070 0.6929 0.0051 0.0047 0.5556 0.0004 0.0054 -0.0049 0.0034 -261.6790 -283.9844 -2.7826 -2.8207
0.6937 0.05 200 0.6932 0.0058 0.6925 0.0087 0.0073 0.5913 0.0014 0.0083 -0.0053 0.0045 -261.4165 -283.6240 -2.7846 -2.8225
0.6932 0.08 300 0.6918 0.0111 0.6908 0.0176 0.0128 0.6706 0.0048 0.0213 -0.0101 0.0104 -260.8730 -282.7396 -2.7791 -2.8173
0.6923 0.1 400 0.6902 0.0155 0.6883 0.0367 0.0269 0.6786 0.0099 0.0426 -0.0200 0.0206 -259.4627 -280.8207 -2.7795 -2.8175
0.6931 0.13 500 0.6883 0.0265 0.6845 0.0593 0.0416 0.6865 0.0177 0.0763 -0.0341 0.0362 -257.9933 -278.5678 -2.7736 -2.8118
0.6831 0.16 600 0.6870 0.0395 0.6813 0.0836 0.0590 0.6964 0.0245 0.1063 -0.0471 0.0502 -256.2458 -276.1382 -2.7761 -2.8139
0.6843 0.18 700 0.6863 0.0531 0.6787 0.0901 0.0600 0.7083 0.0301 0.1278 -0.0553 0.0599 -256.1454 -275.4836 -2.7667 -2.8047
0.678 0.21 800 0.6882 0.0907 0.6756 0.0978 0.0610 0.7004 0.0368 0.1540 -0.0663 0.0722 -256.0468 -274.7102 -2.7649 -2.8027
0.6788 0.24 900 0.6861 0.0828 0.6741 0.1163 0.0761 0.7123 0.0401 0.1679 -0.0693 0.0777 -254.5357 -272.8672 -2.7642 -2.8025
0.6883 0.26 1000 0.6859 0.0910 0.6726 0.1215 0.0781 0.7143 0.0434 0.1772 -0.0735 0.0821 -254.3346 -272.3451 -2.7648 -2.8032
0.692 0.29 1100 0.6851 0.0917 0.6716 0.1258 0.0803 0.7024 0.0455 0.1845 -0.0761 0.0853 -254.1160 -271.9105 -2.7703 -2.8088
0.6781 0.31 1200 0.6848 0.0888 0.6704 0.1337 0.0855 0.7163 0.0481 0.1933 -0.0787 0.0893 -253.5947 -271.1252 -2.7672 -2.8056
0.6977 0.34 1300 0.6844 0.0955 0.6697 0.1365 0.0866 0.7222 0.0498 0.1984 -0.0814 0.0917 -253.4859 -270.8478 -2.7666 -2.8049
0.6773 0.37 1400 0.6852 0.1091 0.6683 0.1360 0.0832 0.7163 0.0529 0.2084 -0.0866 0.0967 -253.8343 -270.8923 -2.7626 -2.8007
0.6802 0.39 1500 0.6854 0.1243 0.6673 0.1396 0.0845 0.7202 0.0550 0.2155 -0.0895 0.1001 -253.6978 -270.5392 -2.7549 -2.7934
0.6816 0.42 1600 0.6848 0.1226 0.6669 0.1427 0.0866 0.7262 0.0561 0.2196 -0.0916 0.1025 -253.4888 -270.2238 -2.7574 -2.7953
0.6737 0.44 1700 0.6863 0.1428 0.6654 0.1435 0.0840 0.7202 0.0595 0.2302 -0.0957 0.1073 -253.7508 -270.1495 -2.7550 -2.7931
0.6913 0.47 1800 0.6822 0.1097 0.6662 0.1546 0.0971 0.7202 0.0576 0.2258 -0.0916 0.1046 -252.4411 -269.0311 -2.7541 -2.7922
0.691 0.5 1900 0.6836 0.1337 0.6649 0.1512 0.0907 0.7222 0.0605 0.2345 -0.0960 0.1092 -253.0802 -269.3756 -2.7463 -2.7846
0.6743 0.52 2000 0.6820 0.1170 0.6653 0.1553 0.0956 0.7183 0.0597 0.2328 -0.0959 0.1085 -252.5889 -268.9686 -2.7460 -2.7845
0.6787 0.55 2100 0.6826 0.1255 0.6646 0.1546 0.0933 0.7183 0.0613 0.2373 -0.0970 0.1105 -252.8206 -269.0393 -2.7445 -2.7832
0.6738 0.58 2200 0.6816 0.1157 0.6645 0.1584 0.0968 0.7183 0.0615 0.2383 -0.0969 0.1108 -252.4646 -268.6587 -2.7418 -2.7803
0.675 0.6 2300 0.6816 0.1210 0.6642 0.1590 0.0969 0.7242 0.0621 0.2404 -0.0974 0.1118 -252.4595 -268.5912 -2.7450 -2.7834
0.6821 0.63 2400 0.6832 0.1411 0.6633 0.1563 0.0921 0.7202 0.0642 0.2465 -0.1010 0.1148 -252.9347 -268.8607 -2.7466 -2.7849
0.6881 0.65 2500 0.6830 0.1426 0.6631 0.1570 0.0922 0.7222 0.0648 0.2474 -0.1022 0.1156 -252.9272 -268.7935 -2.7492 -2.7874
0.6871 0.68 2600 0.6808 0.1158 0.6637 0.1633 0.1001 0.7242 0.0632 0.2447 -0.0991 0.1134 -252.1409 -268.1626 -2.7451 -2.7836
0.683 0.71 2700 0.6799 0.1090 0.6640 0.1648 0.1022 0.7242 0.0627 0.2422 -0.0980 0.1124 -251.9336 -268.0138 -2.7438 -2.7825
0.6785 0.73 2800 0.6809 0.1194 0.6634 0.1626 0.0986 0.7143 0.0640 0.2456 -0.1001 0.1142 -252.2893 -268.2341 -2.7442 -2.7829
0.6804 0.76 2900 0.6822 0.1346 0.6629 0.1608 0.0956 0.7202 0.0652 0.2495 -0.1023 0.1162 -252.5925 -268.4146 -2.7461 -2.7847
0.6741 0.79 3000 0.6808 0.1180 0.6631 0.1638 0.0991 0.7202 0.0648 0.2480 -0.1005 0.1153 -252.2409 -268.1100 -2.7461 -2.7845
0.6856 0.81 3100 0.6812 0.1276 0.6628 0.1627 0.0973 0.7222 0.0654 0.2498 -0.1020 0.1164 -252.4184 -268.2234 -2.7438 -2.7823
0.6678 0.84 3200 0.6809 0.1244 0.6627 0.1636 0.0981 0.7222 0.0655 0.2500 -0.1016 0.1161 -252.3354 -268.1345 -2.7472 -2.7854
0.6786 0.86 3300 0.6811 0.1267 0.6627 0.1639 0.0983 0.7222 0.0656 0.2502 -0.1019 0.1165 -252.3217 -268.1092 -2.7425 -2.7811
0.675 0.89 3400 0.6808 0.1222 0.6627 0.1646 0.0991 0.7202 0.0655 0.2497 -0.1011 0.1161 -252.2420 -268.0397 -2.7448 -2.7833
0.6743 0.92 3500 0.6805 0.1215 0.6627 0.1645 0.0990 0.7242 0.0655 0.2502 -0.1016 0.1164 -252.2541 -268.0490 -2.7474 -2.7856
0.6778 0.94 3600 0.6810 0.1279 0.6626 0.1643 0.0985 0.7183 0.0658 0.2507 -0.1017 0.1167 -252.3022 -268.0681 -2.7470 -2.7853
0.6788 0.97 3700 0.6811 0.1286 0.6627 0.1640 0.0983 0.7222 0.0657 0.2507 -0.1021 0.1167 -252.3195 -268.0980 -2.7427 -2.7813
0.6668 0.99 3800 0.6811 0.1287 0.6627 0.1640 0.0983 0.7202 0.0657 0.2509 -0.1025 0.1169 -252.3186 -268.0970 -2.7445 -2.7830

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-dpop-qlora-uf-5e-7-real

Adapter
(137)
this model

Dataset used to train just1nseo/zephyr-dpop-qlora-uf-5e-7-real