Edit model card

eurus-dpop-qlora-uffull-5e-6

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6839
  • Positive Losses: 0.1624
  • Dpo Losses: 0.6521
  • Rewards/chosen: 0.1700
  • Rewards/rejected: 0.0789
  • Rewards/accuracies: 0.6925
  • Rewards/margins: 0.0911
  • Rewards/margins Max: 0.3628
  • Rewards/margins Min: -0.1215
  • Rewards/margins Std: 0.1623
  • Logps/rejected: -254.9311
  • Logps/chosen: -258.7703
  • Logits/rejected: -2.1337
  • Logits/chosen: -2.2374

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6938 0.03 100 0.6936 0.0203 0.6916 0.0132 0.0102 0.5694 0.0031 0.0248 -0.0143 0.0127 -261.8033 -274.4498 -2.2307 -2.3411
0.6869 0.05 200 0.6912 0.0334 0.6868 0.0359 0.0229 0.6647 0.0131 0.0631 -0.0315 0.0308 -260.5341 -272.1819 -2.2272 -2.3372
0.6821 0.08 300 0.6796 0.0182 0.6766 0.1300 0.0953 0.6984 0.0347 0.1428 -0.0630 0.0675 -253.2899 -262.7724 -2.2143 -2.3237
0.6907 0.1 400 0.6964 0.2547 0.6648 0.1211 0.0603 0.7341 0.0607 0.2318 -0.0924 0.1082 -256.7861 -263.6666 -2.1866 -2.2945
0.7003 0.13 500 0.6790 0.0658 0.6721 0.1587 0.1133 0.6687 0.0453 0.2067 -0.0980 0.1022 -251.4865 -259.9060 -2.1776 -2.2857
0.6682 0.16 600 0.6923 0.2034 0.6656 0.1367 0.0773 0.7083 0.0594 0.2414 -0.0936 0.1136 -255.0890 -262.0998 -2.1644 -2.2719
0.7018 0.18 700 0.6777 0.0831 0.6687 0.1567 0.1039 0.6825 0.0528 0.2334 -0.1010 0.1123 -252.4310 -260.1074 -2.1460 -2.2536
0.6821 0.21 800 0.6810 0.1235 0.6652 0.1589 0.0981 0.6746 0.0608 0.2608 -0.1113 0.1239 -253.0055 -259.8834 -2.1533 -2.2635
0.688 0.24 900 0.6811 0.1393 0.6641 0.1595 0.0965 0.6726 0.0630 0.2600 -0.1050 0.1223 -253.1665 -259.8242 -2.1458 -2.2566
0.7002 0.26 1000 0.6804 0.1012 0.6637 0.1633 0.0992 0.6806 0.0641 0.2711 -0.1092 0.1279 -252.9000 -259.4406 -2.1385 -2.2488
0.7049 0.29 1100 0.6867 0.1705 0.6610 0.1610 0.0903 0.6726 0.0706 0.2988 -0.1184 0.1405 -253.7870 -259.6789 -2.1422 -2.2507
0.6695 0.31 1200 0.6800 0.0980 0.6609 0.1651 0.0948 0.6786 0.0704 0.2944 -0.1063 0.1348 -253.3438 -259.2615 -2.1404 -2.2482
0.7072 0.34 1300 0.6938 0.2466 0.6560 0.1434 0.0618 0.6984 0.0816 0.3217 -0.1107 0.1468 -256.6372 -261.4336 -2.1224 -2.2308
0.711 0.37 1400 0.6854 0.1363 0.6586 0.1596 0.0834 0.6905 0.0762 0.3200 -0.1195 0.1464 -254.4820 -259.8182 -2.1037 -2.2129
0.6953 0.39 1500 0.7056 0.3813 0.6537 0.1517 0.0641 0.6925 0.0876 0.3532 -0.1193 0.1604 -256.4078 -260.6039 -2.1212 -2.2299
0.7031 0.42 1600 0.6876 0.1880 0.6579 0.1582 0.0805 0.6825 0.0777 0.3172 -0.1122 0.1462 -254.7677 -259.9510 -2.1057 -2.2131
0.6709 0.44 1700 0.6840 0.1184 0.6572 0.1612 0.0823 0.6944 0.0789 0.3142 -0.1098 0.1436 -254.5866 -259.6552 -2.1066 -2.2162
0.743 0.47 1800 0.6830 0.1184 0.6560 0.1611 0.0794 0.6984 0.0817 0.3243 -0.1116 0.1469 -254.8840 -259.6642 -2.1226 -2.2307
0.7089 0.5 1900 0.6843 0.1366 0.6564 0.1601 0.0792 0.6984 0.0809 0.3272 -0.1109 0.1472 -254.8997 -259.7663 -2.1338 -2.2399
0.667 0.52 2000 0.6773 0.0774 0.6586 0.1724 0.0961 0.6706 0.0762 0.3214 -0.1184 0.1476 -253.2098 -258.5389 -2.1473 -2.2545
0.6863 0.55 2100 0.6862 0.1428 0.6553 0.1594 0.0759 0.6925 0.0835 0.3379 -0.1163 0.1521 -255.2269 -259.8331 -2.1381 -2.2444
0.6773 0.58 2200 0.6892 0.1893 0.6538 0.1550 0.0682 0.7063 0.0868 0.3450 -0.1155 0.1544 -256.0006 -260.2700 -2.1445 -2.2499
0.6967 0.6 2300 0.6864 0.1734 0.6542 0.1602 0.0742 0.6964 0.0859 0.3410 -0.1156 0.1532 -255.3992 -259.7585 -2.1438 -2.2482
0.6674 0.63 2400 0.6830 0.1322 0.6544 0.1659 0.0802 0.6964 0.0857 0.3406 -0.1206 0.1543 -254.7984 -259.1836 -2.1348 -2.2399
0.6892 0.65 2500 0.6823 0.1307 0.6547 0.1710 0.0859 0.6845 0.0851 0.3442 -0.1212 0.1557 -254.2301 -258.6753 -2.1352 -2.2405
0.6957 0.68 2600 0.6826 0.1279 0.6549 0.1678 0.0832 0.6984 0.0846 0.3400 -0.1172 0.1528 -254.5000 -258.9983 -2.1391 -2.2437
0.6711 0.71 2700 0.6817 0.1148 0.6560 0.1716 0.0895 0.6905 0.0821 0.3346 -0.1178 0.1514 -253.8746 -258.6187 -2.1325 -2.2367
0.6669 0.73 2800 0.6796 0.1074 0.6558 0.1738 0.0910 0.6925 0.0829 0.3401 -0.1230 0.1551 -253.7232 -258.3900 -2.1375 -2.2415
0.6942 0.76 2900 0.6827 0.1376 0.6534 0.1698 0.0814 0.6984 0.0883 0.3535 -0.1229 0.1597 -254.6791 -258.7988 -2.1379 -2.2420
0.6631 0.79 3000 0.6843 0.1520 0.6526 0.1679 0.0779 0.6925 0.0900 0.3581 -0.1203 0.1605 -255.0340 -258.9839 -2.1417 -2.2452
0.6838 0.81 3100 0.6811 0.1294 0.6538 0.1745 0.0870 0.6905 0.0875 0.3554 -0.1214 0.1595 -254.1182 -258.3243 -2.1340 -2.2382
0.6598 0.84 3200 0.6810 0.1359 0.6536 0.1735 0.0857 0.6944 0.0879 0.3550 -0.1213 0.1595 -254.2526 -258.4206 -2.1345 -2.2381
0.6749 0.86 3300 0.6815 0.1399 0.6533 0.1727 0.0843 0.6925 0.0884 0.3554 -0.1208 0.1594 -254.3860 -258.4992 -2.1310 -2.2349
0.675 0.89 3400 0.6825 0.1506 0.6527 0.1720 0.0823 0.6984 0.0897 0.3590 -0.1217 0.1610 -254.5927 -258.5737 -2.1318 -2.2357
0.7038 0.92 3500 0.6850 0.1690 0.6516 0.1687 0.0764 0.6925 0.0923 0.3654 -0.1209 0.1632 -255.1800 -258.9013 -2.1341 -2.2379
0.6684 0.94 3600 0.6840 0.1637 0.6521 0.1697 0.0786 0.6944 0.0912 0.3632 -0.1212 0.1625 -254.9620 -258.8000 -2.1292 -2.2332
0.6882 0.97 3700 0.6840 0.1614 0.6521 0.1701 0.0789 0.6944 0.0912 0.3629 -0.1210 0.1624 -254.9257 -258.7601 -2.1350 -2.2388
0.687 0.99 3800 0.6839 0.1611 0.6520 0.1701 0.0788 0.6984 0.0913 0.3629 -0.1212 0.1624 -254.9432 -258.7651 -2.1329 -2.2368

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
6
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train just1nseo/eurus-dpop-qlora-uffull-5e-6