Edit model card

eurus-dpop-qlora-uf-5e-7-real

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6799
  • Positive Losses: 0.0815
  • Dpo Losses: 0.6692
  • Rewards/chosen: 0.1294
  • Rewards/rejected: 0.0785
  • Rewards/accuracies: 0.7123
  • Rewards/margins: 0.0509
  • Rewards/margins Max: 0.1970
  • Rewards/margins Min: -0.0800
  • Rewards/margins Std: 0.0922
  • Logps/rejected: -254.9750
  • Logps/chosen: -262.8372
  • Logits/rejected: -2.1976
  • Logits/chosen: -2.3056

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.696 0.03 100 0.6947 0.0123 0.6932 0.0033 0.0035 0.4722 -0.0001 0.0066 -0.0062 0.0043 -262.4742 -275.4397 -2.2340 -2.3447
0.694 0.05 200 0.6936 0.0097 0.6930 0.0061 0.0057 0.5139 0.0004 0.0095 -0.0082 0.0059 -262.2527 -275.1669 -2.2395 -2.3497
0.6925 0.08 300 0.6925 0.0066 0.6923 0.0105 0.0088 0.5595 0.0017 0.0156 -0.0103 0.0085 -261.9362 -274.7206 -2.2341 -2.3445
0.6922 0.1 400 0.6917 0.0064 0.6914 0.0163 0.0127 0.5913 0.0036 0.0248 -0.0145 0.0128 -261.5553 -274.1491 -2.2231 -2.3340
0.6911 0.13 500 0.6910 0.0140 0.6902 0.0234 0.0173 0.5913 0.0061 0.0387 -0.0210 0.0192 -261.0899 -273.4387 -2.2290 -2.3389
0.6871 0.16 600 0.6908 0.0225 0.6889 0.0296 0.0210 0.6111 0.0086 0.0521 -0.0264 0.0252 -260.7244 -272.8181 -2.2202 -2.3304
0.689 0.18 700 0.6901 0.0274 0.6876 0.0385 0.0270 0.6230 0.0115 0.0661 -0.0346 0.0323 -260.1226 -271.9265 -2.2190 -2.3290
0.6859 0.21 800 0.6883 0.0168 0.6861 0.0488 0.0343 0.6349 0.0146 0.0765 -0.0371 0.0366 -259.3933 -270.8916 -2.2215 -2.3312
0.691 0.24 900 0.6865 0.0123 0.6837 0.0758 0.0562 0.6746 0.0196 0.0939 -0.0442 0.0451 -257.2037 -268.1935 -2.2186 -2.3282
0.6913 0.26 1000 0.6848 0.0284 0.6794 0.1034 0.0746 0.6845 0.0288 0.1288 -0.0575 0.0616 -255.3635 -265.4328 -2.2139 -2.3237
0.6909 0.29 1100 0.6846 0.0441 0.6772 0.1070 0.0736 0.7004 0.0335 0.1416 -0.0622 0.0676 -255.4643 -265.0727 -2.2119 -2.3218
0.68 0.31 1200 0.6871 0.0941 0.6751 0.1084 0.0703 0.7103 0.0380 0.1613 -0.0698 0.0765 -255.7855 -264.9369 -2.2163 -2.3256
0.695 0.34 1300 0.6841 0.0711 0.6749 0.1134 0.0750 0.7024 0.0385 0.1622 -0.0703 0.0770 -255.3223 -264.4297 -2.2192 -2.3278
0.6805 0.37 1400 0.6853 0.0984 0.6733 0.1125 0.0705 0.7103 0.0419 0.1735 -0.0741 0.0818 -255.7660 -264.5257 -2.2104 -2.3196
0.6848 0.39 1500 0.6857 0.1050 0.6727 0.1154 0.0720 0.7024 0.0434 0.1814 -0.0771 0.0853 -255.6182 -264.2326 -2.2035 -2.3127
0.6808 0.42 1600 0.6826 0.0796 0.6730 0.1207 0.0780 0.7143 0.0427 0.1763 -0.0745 0.0829 -255.0231 -263.7046 -2.2129 -2.3212
0.6732 0.44 1700 0.6846 0.1029 0.6717 0.1181 0.0728 0.7044 0.0454 0.1851 -0.0765 0.0868 -255.5430 -263.9595 -2.2054 -2.3143
0.6995 0.47 1800 0.6804 0.0699 0.6722 0.1233 0.0790 0.7083 0.0443 0.1778 -0.0745 0.0833 -254.9202 -263.4485 -2.2043 -2.3136
0.6824 0.5 1900 0.6816 0.0861 0.6713 0.1223 0.0760 0.7063 0.0464 0.1866 -0.0767 0.0872 -255.2250 -263.5396 -2.2077 -2.3163
0.6805 0.52 2000 0.6786 0.0523 0.6722 0.1291 0.0848 0.7123 0.0443 0.1758 -0.0732 0.0822 -254.3396 -262.8673 -2.2056 -2.3146
0.6827 0.55 2100 0.6802 0.0708 0.6710 0.1253 0.0784 0.7123 0.0469 0.1860 -0.0766 0.0868 -254.9799 -263.2408 -2.2092 -2.3177
0.6746 0.58 2200 0.6794 0.0680 0.6712 0.1273 0.0810 0.7103 0.0464 0.1843 -0.0764 0.0864 -254.7231 -263.0406 -2.2069 -2.3152
0.6785 0.6 2300 0.6800 0.0733 0.6707 0.1264 0.0789 0.7044 0.0475 0.1884 -0.0774 0.0880 -254.9275 -263.1323 -2.2015 -2.3102
0.6814 0.63 2400 0.6801 0.0765 0.6701 0.1270 0.0782 0.7103 0.0488 0.1911 -0.0775 0.0891 -254.9968 -263.0727 -2.1975 -2.3061
0.6871 0.65 2500 0.6804 0.0817 0.6696 0.1257 0.0759 0.7103 0.0498 0.1935 -0.0785 0.0904 -255.2312 -263.2015 -2.2030 -2.3112
0.6898 0.68 2600 0.6807 0.0878 0.6694 0.1256 0.0752 0.7143 0.0504 0.1974 -0.0786 0.0916 -255.3051 -263.2164 -2.2012 -2.3092
0.6743 0.71 2700 0.6807 0.0904 0.6693 0.1264 0.0758 0.7123 0.0506 0.1972 -0.0789 0.0918 -255.2393 -263.1350 -2.1994 -2.3075
0.6738 0.73 2800 0.6800 0.0806 0.6698 0.1281 0.0785 0.7083 0.0496 0.1937 -0.0789 0.0908 -254.9662 -262.9601 -2.1961 -2.3042
0.6842 0.76 2900 0.6801 0.0851 0.6693 0.1275 0.0769 0.7183 0.0506 0.1963 -0.0794 0.0917 -255.1314 -263.0265 -2.1946 -2.3029
0.6734 0.79 3000 0.6801 0.0854 0.6692 0.1279 0.0771 0.7163 0.0508 0.1971 -0.0798 0.0921 -255.1094 -262.9807 -2.1911 -2.2995
0.6772 0.81 3100 0.6800 0.0866 0.6692 0.1284 0.0776 0.7123 0.0508 0.1970 -0.0799 0.0922 -255.0642 -262.9377 -2.1975 -2.3054
0.6748 0.84 3200 0.6797 0.0796 0.6693 0.1293 0.0786 0.7163 0.0507 0.1960 -0.0798 0.0918 -254.9594 -262.8471 -2.1967 -2.3047
0.6821 0.86 3300 0.6797 0.0824 0.6692 0.1290 0.0782 0.7103 0.0507 0.1960 -0.0802 0.0920 -254.9983 -262.8787 -2.2014 -2.3090
0.6759 0.89 3400 0.6797 0.0812 0.6692 0.1290 0.0781 0.7143 0.0509 0.1971 -0.0804 0.0922 -255.0086 -262.8699 -2.1973 -2.3053
0.678 0.92 3500 0.6798 0.0823 0.6692 0.1288 0.0779 0.7143 0.0509 0.1971 -0.0803 0.0923 -255.0269 -262.8900 -2.1943 -2.3025
0.6798 0.94 3600 0.6795 0.0808 0.6693 0.1292 0.0786 0.7143 0.0506 0.1957 -0.0803 0.0919 -254.9599 -262.8558 -2.1965 -2.3045
0.6848 0.97 3700 0.6797 0.0806 0.6692 0.1292 0.0784 0.7123 0.0509 0.1967 -0.0801 0.0921 -254.9834 -262.8504 -2.2017 -2.3093
0.6734 0.99 3800 0.6797 0.0822 0.6692 0.1295 0.0787 0.7123 0.0508 0.1967 -0.0800 0.0920 -254.9542 -262.8235 -2.1947 -2.3028

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
2
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train just1nseo/eurus-dpop-qlora-uf-5e-7-real