Edit model card

eurus-dpop-qlora-uf-ours-uffull-5e-6

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF and the generation/UFfull datasets. It achieves the following results on the evaluation set:

  • Loss: 0.6837
  • Positive Losses: 0.3336
  • Dpo Losses: 0.6519
  • Rewards/chosen: 0.1618
  • Rewards/rejected: 0.0697
  • Rewards/accuracies: 0.6960
  • Rewards/margins: 0.0921
  • Rewards/margins Max: 0.3623
  • Rewards/margins Min: -0.1184
  • Rewards/margins Std: 0.1602
  • Logps/rejected: -254.2729
  • Logps/chosen: -255.4841
  • Logits/rejected: -2.1811
  • Logits/chosen: -2.2772

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6927 0.02 100 0.6925 0.0046 0.6922 0.0145 0.0127 0.5480 0.0018 0.0204 -0.0140 0.0112 -259.9735 -270.2098 -2.1753 -2.2851
0.6928 0.05 200 0.6926 0.0501 0.6881 0.0395 0.0291 0.5960 0.0104 0.0706 -0.0363 0.0351 -258.3304 -267.7088 -2.1615 -2.2708
0.6854 0.07 300 0.6829 0.0095 0.6820 0.1239 0.1004 0.6485 0.0235 0.1213 -0.0594 0.0594 -251.2040 -259.2754 -2.1597 -2.2680
0.7146 0.1 400 0.6961 0.2328 0.6708 0.1096 0.0619 0.6845 0.0477 0.1987 -0.0845 0.0939 -255.0527 -260.7050 -2.1479 -2.2540
0.7008 0.12 500 0.6895 0.2048 0.6680 0.1200 0.0660 0.6815 0.0540 0.2196 -0.0893 0.1035 -254.6419 -259.6605 -2.1377 -2.2443
0.6931 0.14 600 0.6821 0.1306 0.6672 0.1310 0.0755 0.6920 0.0555 0.2219 -0.0883 0.1038 -253.6935 -258.5599 -2.1359 -2.2425
0.6684 0.17 700 0.6805 0.1322 0.6669 0.1446 0.0879 0.6850 0.0567 0.2378 -0.0948 0.1112 -252.4529 -257.2010 -2.1330 -2.2403
0.6892 0.19 800 0.6876 0.2539 0.6631 0.1390 0.0738 0.6900 0.0653 0.2606 -0.1012 0.1216 -253.8653 -257.7604 -2.1681 -2.2734
0.6887 0.22 900 0.6834 0.2059 0.6632 0.1491 0.0839 0.6825 0.0652 0.2685 -0.1056 0.1243 -252.8537 -256.7542 -2.1617 -2.2645
0.6785 0.24 1000 0.6885 0.2658 0.6623 0.1477 0.0803 0.6890 0.0674 0.2739 -0.1111 0.1279 -253.2097 -256.8882 -2.1750 -2.2765
0.7053 0.26 1100 0.6771 0.1115 0.6671 0.1649 0.1075 0.6645 0.0574 0.2616 -0.1163 0.1261 -250.4924 -255.1730 -2.1717 -2.2734
0.6642 0.29 1200 0.7051 0.4704 0.6577 0.1311 0.0532 0.6960 0.0778 0.3075 -0.1132 0.1409 -255.9189 -258.5546 -2.2010 -2.3000
0.7044 0.31 1300 0.6949 0.3566 0.6595 0.1433 0.0695 0.6955 0.0737 0.2961 -0.1130 0.1359 -254.2908 -257.3372 -2.2117 -2.3129
0.6837 0.34 1400 0.6878 0.3032 0.6602 0.1478 0.0755 0.7030 0.0723 0.2937 -0.1114 0.1341 -253.6898 -256.8842 -2.1897 -2.2887
0.7168 0.36 1500 0.6919 0.3620 0.6580 0.1429 0.0657 0.7010 0.0773 0.3068 -0.1198 0.1417 -254.6766 -257.3681 -2.1618 -2.2599
0.6956 0.38 1600 0.6867 0.3031 0.6593 0.1451 0.0710 0.7010 0.0741 0.2933 -0.1161 0.1363 -254.1435 -257.1503 -2.1631 -2.2605
0.6737 0.41 1700 0.6772 0.1621 0.6641 0.1706 0.1066 0.6875 0.0640 0.2794 -0.1154 0.1311 -250.5799 -254.6045 -2.1643 -2.2628
0.6826 0.43 1800 0.6796 0.2140 0.6617 0.1639 0.0948 0.6980 0.0691 0.2894 -0.1164 0.1344 -251.7600 -255.2692 -2.1697 -2.2688
0.7015 0.45 1900 0.6877 0.3326 0.6575 0.1498 0.0714 0.7045 0.0783 0.3065 -0.1114 0.1391 -254.1016 -256.6878 -2.1641 -2.2624
0.6828 0.48 2000 0.6894 0.3704 0.6549 0.1504 0.0661 0.7035 0.0844 0.3245 -0.1181 0.1476 -254.6347 -256.6188 -2.1769 -2.2757
0.6658 0.5 2100 0.6846 0.3040 0.6561 0.1566 0.0749 0.7000 0.0818 0.3202 -0.1168 0.1455 -253.7529 -255.9983 -2.1652 -2.2649
0.6767 0.53 2200 0.6849 0.3118 0.6572 0.1553 0.0760 0.6965 0.0794 0.3173 -0.1198 0.1465 -253.6463 -256.1299 -2.1629 -2.2619
0.696 0.55 2300 0.6847 0.3145 0.6563 0.1533 0.0716 0.6985 0.0817 0.3263 -0.1227 0.1504 -254.0864 -256.3344 -2.1556 -2.2545
0.6564 0.57 2400 0.6804 0.2738 0.6559 0.1592 0.0770 0.6950 0.0822 0.3235 -0.1154 0.1471 -253.5391 -255.7443 -2.1666 -2.2650
0.6717 0.6 2500 0.6916 0.4126 0.6519 0.1507 0.0589 0.7025 0.0918 0.3565 -0.1218 0.1596 -255.3540 -256.5908 -2.1731 -2.2721
0.7191 0.62 2600 0.6875 0.3646 0.6521 0.1553 0.0637 0.6995 0.0916 0.3583 -0.1217 0.1599 -254.8743 -256.1363 -2.1882 -2.2857
0.6916 0.65 2700 0.6833 0.3114 0.6534 0.1588 0.0703 0.6905 0.0885 0.3483 -0.1184 0.1559 -254.2137 -255.7853 -2.1821 -2.2801
0.7428 0.67 2800 0.6791 0.2390 0.6559 0.1679 0.0853 0.6905 0.0826 0.3337 -0.1153 0.1493 -252.7114 -254.8721 -2.1918 -2.2884
0.6581 0.69 2900 0.6784 0.2405 0.6550 0.1690 0.0841 0.6955 0.0848 0.3421 -0.1187 0.1536 -252.8289 -254.7648 -2.1957 -2.2932
0.6691 0.72 3000 0.6819 0.2928 0.6533 0.1651 0.0763 0.6930 0.0888 0.3514 -0.1185 0.1566 -253.6139 -255.1557 -2.1800 -2.2776
0.6585 0.74 3100 0.6880 0.3758 0.6514 0.1589 0.0655 0.6980 0.0934 0.3651 -0.1195 0.1618 -254.6915 -255.7693 -2.1771 -2.2747
0.6689 0.77 3200 0.6863 0.3558 0.6515 0.1606 0.0676 0.6980 0.0930 0.3645 -0.1198 0.1617 -254.4865 -255.6035 -2.1869 -2.2837
0.6608 0.79 3300 0.6889 0.3924 0.6507 0.1575 0.0626 0.6965 0.0948 0.3683 -0.1195 0.1630 -254.9793 -255.9175 -2.1791 -2.2756
0.7082 0.81 3400 0.6822 0.3032 0.6528 0.1632 0.0733 0.6910 0.0899 0.3556 -0.1170 0.1576 -253.9138 -255.3429 -2.1835 -2.2794
0.647 0.84 3500 0.6828 0.3155 0.6524 0.1628 0.0719 0.6965 0.0910 0.3582 -0.1174 0.1586 -254.0551 -255.3804 -2.1825 -2.2785
0.7095 0.86 3600 0.6850 0.3475 0.6516 0.1604 0.0677 0.6975 0.0927 0.3628 -0.1186 0.1605 -254.4760 -255.6236 -2.1845 -2.2802
0.6504 0.89 3700 0.6835 0.3264 0.6521 0.1622 0.0707 0.6965 0.0916 0.3609 -0.1177 0.1595 -254.1773 -255.4386 -2.1823 -2.2783
0.6851 0.91 3800 0.6839 0.3360 0.6517 0.1613 0.0688 0.6955 0.0925 0.3629 -0.1186 0.1605 -254.3600 -255.5343 -2.1854 -2.2812
0.6898 0.93 3900 0.6840 0.3369 0.6518 0.1614 0.0690 0.6950 0.0924 0.3633 -0.1186 0.1606 -254.3441 -255.5256 -2.1876 -2.2832
0.6931 0.96 4000 0.6838 0.3337 0.6519 0.1618 0.0697 0.6960 0.0921 0.3623 -0.1186 0.1603 -254.2723 -255.4866 -2.1902 -2.2856
0.6552 0.98 4100 0.6837 0.3337 0.6519 0.1619 0.0696 0.6960 0.0922 0.3631 -0.1185 0.1604 -254.2784 -255.4766 -2.1871 -2.2827

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/eurus-dpop-qlora-uf-ours-uffull-5e-6

Adapter
(18)
this model