Edit model card

eurus-dpop-qlora-uf-5e-6

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6852
  • Positive Losses: 0.2597
  • Dpo Losses: 0.6519
  • Rewards/chosen: 0.1684
  • Rewards/rejected: 0.0765
  • Rewards/accuracies: 0.6920
  • Rewards/margins: 0.0918
  • Rewards/margins Max: 0.3602
  • Rewards/margins Min: -0.1231
  • Rewards/margins Std: 0.1622
  • Logps/rejected: -249.8821
  • Logps/chosen: -258.0372
  • Logits/rejected: -2.0804
  • Logits/chosen: -2.1890

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6922 0.03 100 0.6926 0.0083 0.6919 0.0146 0.0121 0.5580 0.0025 0.0231 -0.0143 0.0122 -256.3296 -273.4202 -2.1882 -2.3107
0.6862 0.05 200 0.6914 0.0342 0.6870 0.0376 0.0250 0.6200 0.0126 0.0688 -0.0317 0.0331 -255.0376 -271.1152 -2.1839 -2.3059
0.681 0.08 300 0.6809 0.0240 0.6765 0.1253 0.0902 0.6710 0.0351 0.1626 -0.0702 0.0758 -248.5213 -262.3472 -2.1769 -2.2970
0.6892 0.1 400 0.6881 0.1749 0.6677 0.1278 0.0732 0.7060 0.0546 0.2226 -0.0895 0.1040 -250.2170 -262.1005 -2.1580 -2.2763
0.6956 0.13 500 0.6840 0.1131 0.6699 0.1486 0.0984 0.6690 0.0502 0.2239 -0.0957 0.1064 -247.6977 -260.0196 -2.1347 -2.2525
0.67 0.16 600 0.6926 0.2091 0.6661 0.1339 0.0754 0.6790 0.0585 0.2458 -0.0958 0.1144 -249.9958 -261.4865 -2.1202 -2.2361
0.7018 0.18 700 0.6815 0.1178 0.6676 0.1464 0.0913 0.6780 0.0551 0.2389 -0.0968 0.1123 -248.4023 -260.2365 -2.0958 -2.2115
0.6931 0.21 800 0.6847 0.1683 0.6645 0.1501 0.0878 0.6740 0.0623 0.2623 -0.1069 0.1239 -248.7592 -259.8651 -2.1055 -2.2248
0.6888 0.24 900 0.6810 0.1241 0.6654 0.1596 0.0992 0.6780 0.0603 0.2561 -0.1093 0.1225 -247.6149 -258.9195 -2.1013 -2.2217
0.7023 0.26 1000 0.6791 0.0820 0.6657 0.1671 0.1072 0.6720 0.0598 0.2606 -0.1092 0.1239 -246.8143 -258.1693 -2.0914 -2.2100
0.708 0.29 1100 0.6869 0.1814 0.6621 0.1602 0.0921 0.6700 0.0681 0.2881 -0.1134 0.1346 -248.3287 -258.8581 -2.0874 -2.2039
0.6712 0.31 1200 0.6822 0.1551 0.6607 0.1611 0.0902 0.6800 0.0709 0.2911 -0.1094 0.1346 -248.5190 -258.7667 -2.0853 -2.2009
0.698 0.34 1300 0.6876 0.2474 0.6588 0.1472 0.0718 0.6880 0.0754 0.3020 -0.1116 0.1402 -250.3554 -260.1554 -2.0728 -2.1889
0.716 0.37 1400 0.6857 0.2405 0.6588 0.1596 0.0836 0.6890 0.0760 0.3191 -0.1277 0.1482 -249.1757 -258.9159 -2.0564 -2.1732
0.6933 0.39 1500 0.7002 0.4045 0.6552 0.1513 0.0672 0.6920 0.0841 0.3434 -0.1200 0.1550 -250.8132 -259.7437 -2.0728 -2.1886
0.6995 0.42 1600 0.6874 0.2627 0.6582 0.1571 0.0800 0.6780 0.0771 0.3194 -0.1159 0.1457 -249.5378 -259.1630 -2.0550 -2.1698
0.6798 0.44 1700 0.6861 0.2256 0.6568 0.1575 0.0775 0.6980 0.0800 0.3193 -0.1164 0.1455 -249.7892 -259.1237 -2.0514 -2.1684
0.7415 0.47 1800 0.6838 0.2001 0.6564 0.1593 0.0782 0.6900 0.0810 0.3232 -0.1166 0.1473 -249.7147 -258.9513 -2.0880 -2.2013
0.7042 0.5 1900 0.6825 0.1645 0.6587 0.1628 0.0871 0.6730 0.0757 0.3152 -0.1123 0.1438 -248.8304 -258.5951 -2.0881 -2.1992
0.6659 0.52 2000 0.6815 0.1761 0.6563 0.1670 0.0856 0.6850 0.0814 0.3314 -0.1188 0.1510 -248.9763 -258.1778 -2.0997 -2.2118
0.6858 0.55 2100 0.6873 0.2454 0.6554 0.1582 0.0749 0.6820 0.0833 0.3331 -0.1142 0.1509 -250.0458 -259.0565 -2.0862 -2.1969
0.6863 0.58 2200 0.6908 0.2874 0.6538 0.1534 0.0666 0.6910 0.0869 0.3416 -0.1143 0.1533 -250.8793 -259.5337 -2.0919 -2.2024
0.6953 0.6 2300 0.6842 0.2129 0.6559 0.1633 0.0812 0.6900 0.0821 0.3281 -0.1138 0.1483 -249.4186 -258.5474 -2.0943 -2.2042
0.6677 0.63 2400 0.6871 0.2649 0.6529 0.1612 0.0720 0.6990 0.0892 0.3470 -0.1224 0.1573 -250.3335 -258.7551 -2.0844 -2.1939
0.6945 0.65 2500 0.6879 0.2797 0.6528 0.1626 0.0730 0.6960 0.0896 0.3520 -0.1223 0.1588 -250.2403 -258.6179 -2.0869 -2.1960
0.6906 0.68 2600 0.6810 0.1960 0.6565 0.1697 0.0886 0.6900 0.0811 0.3315 -0.1197 0.1510 -248.6812 -257.9061 -2.0893 -2.1981
0.6707 0.71 2700 0.6816 0.1838 0.6573 0.1703 0.0908 0.6760 0.0795 0.3292 -0.1212 0.1508 -248.4567 -257.8461 -2.0809 -2.1890
0.6709 0.73 2800 0.6803 0.1860 0.6557 0.1724 0.0893 0.6720 0.0832 0.3403 -0.1252 0.1559 -248.6116 -257.6317 -2.0954 -2.2026
0.6962 0.76 2900 0.6873 0.2708 0.6524 0.1646 0.0737 0.6980 0.0908 0.3584 -0.1252 0.1623 -250.1649 -258.4195 -2.0941 -2.2015
0.6646 0.79 3000 0.6844 0.2383 0.6532 0.1681 0.0794 0.6940 0.0888 0.3512 -0.1231 0.1591 -249.6004 -258.0627 -2.0832 -2.1915
0.6842 0.81 3100 0.6822 0.2196 0.6536 0.1726 0.0846 0.6910 0.0880 0.3508 -0.1251 0.1596 -249.0746 -257.6134 -2.0905 -2.1987
0.6639 0.84 3200 0.6821 0.2215 0.6536 0.1722 0.0842 0.6830 0.0880 0.3500 -0.1241 0.1590 -249.1206 -257.6602 -2.0805 -2.1887
0.6728 0.86 3300 0.6830 0.2295 0.6530 0.1706 0.0814 0.6840 0.0892 0.3529 -0.1232 0.1597 -249.3954 -257.8120 -2.0848 -2.1928
0.6765 0.89 3400 0.6847 0.2500 0.6523 0.1693 0.0784 0.6880 0.0909 0.3575 -0.1232 0.1612 -249.6923 -257.9415 -2.0844 -2.1926
0.7087 0.92 3500 0.6872 0.2816 0.6514 0.1664 0.0732 0.6950 0.0932 0.3639 -0.1235 0.1634 -250.2141 -258.2385 -2.0867 -2.1945
0.6705 0.94 3600 0.6855 0.2617 0.6519 0.1681 0.0762 0.6960 0.0919 0.3604 -0.1229 0.1621 -249.9158 -258.0692 -2.0888 -2.1965
0.6877 0.97 3700 0.6852 0.2585 0.6519 0.1685 0.0766 0.6910 0.0919 0.3599 -0.1228 0.1620 -249.8785 -258.0270 -2.0891 -2.1968
0.6891 0.99 3800 0.6852 0.2586 0.6519 0.1684 0.0765 0.6950 0.0919 0.3605 -0.1227 0.1621 -249.8860 -258.0364 -2.0835 -2.1919

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/eurus-dpop-qlora-uf-5e-6

Adapter
(18)
this model

Dataset used to train just1nseo/eurus-dpop-qlora-uf-5e-6