just1nseo's picture
End of training
322478f verified
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: openbmb/Eurus-7b-sft
datasets:
  - generation/UF
  - generation/UFfull
model-index:
  - name: eurus-dpo-qlora-uf-ours-uffull-5e-7
    results: []

eurus-dpo-qlora-uf-ours-uffull-5e-7

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF and the generation/UFfull datasets. It achieves the following results on the evaluation set:

  • Loss: 0.5784
  • Rewards/chosen: -0.8137
  • Rewards/rejected: -1.4080
  • Rewards/accuracies: 0.6890
  • Rewards/margins: 0.5943
  • Rewards/margins Max: 2.7059
  • Rewards/margins Min: -0.7893
  • Rewards/margins Std: 1.1535
  • Logps/rejected: -402.0422
  • Logps/chosen: -353.0313
  • Logits/rejected: -2.0261
  • Logits/chosen: -2.1316

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6928 0.02 100 0.6928 -0.0005 -0.0012 0.5665 0.0007 0.0070 -0.0053 0.0040 -261.3662 -271.7138 -2.1756 -2.2858
0.6908 0.05 200 0.6916 -0.0048 -0.0080 0.6020 0.0032 0.0263 -0.0174 0.0141 -262.0459 -272.1466 -2.1761 -2.2858
0.6872 0.07 300 0.6890 -0.0166 -0.0256 0.6070 0.0090 0.0722 -0.0467 0.0382 -263.7999 -273.3193 -2.1667 -2.2764
0.6838 0.1 400 0.6844 -0.0379 -0.0580 0.6020 0.0201 0.1566 -0.0970 0.0816 -267.0469 -275.4539 -2.1606 -2.2694
0.6798 0.12 500 0.6789 -0.0627 -0.0970 0.6175 0.0342 0.2449 -0.1414 0.1248 -270.9397 -277.9361 -2.1579 -2.2653
0.685 0.14 600 0.6730 -0.1084 -0.1602 0.6210 0.0518 0.3502 -0.1978 0.1782 -277.2612 -282.5006 -2.1431 -2.2503
0.659 0.17 700 0.6673 -0.1744 -0.2467 0.6230 0.0722 0.4794 -0.2753 0.2460 -285.9120 -289.1078 -2.1416 -2.2474
0.6583 0.19 800 0.6609 -0.2333 -0.3274 0.6190 0.0941 0.5863 -0.3297 0.2990 -293.9805 -294.9917 -2.1391 -2.2443
0.6461 0.22 900 0.6549 -0.2805 -0.3960 0.6260 0.1155 0.6830 -0.3766 0.3465 -300.8446 -299.7142 -2.1259 -2.2312
0.6294 0.24 1000 0.6490 -0.3292 -0.4685 0.6320 0.1393 0.7941 -0.4289 0.4003 -308.0930 -304.5846 -2.1230 -2.2277
0.6195 0.26 1100 0.6439 -0.4060 -0.5737 0.6300 0.1676 0.9328 -0.4941 0.4674 -318.6076 -312.2675 -2.1223 -2.2261
0.5908 0.29 1200 0.6383 -0.4424 -0.6330 0.6365 0.1906 1.0210 -0.5224 0.5060 -324.5404 -315.8986 -2.1168 -2.2197
0.6 0.31 1300 0.6348 -0.6416 -0.8848 0.6410 0.2432 1.3117 -0.6516 0.6444 -349.7195 -335.8222 -2.0950 -2.1960
0.6562 0.34 1400 0.6244 -0.5677 -0.8319 0.6445 0.2643 1.3351 -0.6114 0.6396 -344.4360 -328.4299 -2.0991 -2.2013
0.6223 0.36 1500 0.6182 -0.6900 -1.0011 0.6550 0.3111 1.5452 -0.6698 0.7297 -361.3520 -340.6633 -2.0745 -2.1758
0.5927 0.38 1600 0.6113 -0.7472 -1.1111 0.6620 0.3639 1.7606 -0.7176 0.8176 -372.3524 -346.3848 -2.0586 -2.1605
0.5646 0.41 1700 0.6104 -0.9327 -1.3738 0.6680 0.4412 2.1427 -0.8518 0.9847 -398.6274 -364.9301 -2.0408 -2.1421
0.5765 0.43 1800 0.5997 -0.7005 -1.1096 0.6710 0.4091 1.9021 -0.7012 0.8613 -372.1998 -341.7125 -2.0577 -2.1618
0.6009 0.45 1900 0.5961 -0.7594 -1.2075 0.6705 0.4481 2.0697 -0.7257 0.9258 -381.9936 -347.6074 -2.0498 -2.1542
0.6246 0.48 2000 0.5927 -0.7136 -1.1673 0.6855 0.4536 2.0828 -0.6960 0.9188 -377.9705 -343.0267 -2.0560 -2.1615
0.5758 0.5 2100 0.5903 -0.7092 -1.1658 0.6865 0.4566 2.0892 -0.6923 0.9199 -377.8206 -342.5804 -2.0496 -2.1544
0.5821 0.53 2200 0.5914 -0.8181 -1.3146 0.6770 0.4965 2.2775 -0.7770 1.0093 -392.7020 -353.4724 -2.0336 -2.1376
0.5703 0.55 2300 0.5908 -0.8387 -1.3505 0.6865 0.5118 2.3511 -0.7905 1.0381 -396.2970 -355.5333 -2.0337 -2.1371
0.5852 0.57 2400 0.5861 -0.7365 -1.2366 0.6840 0.5001 2.2801 -0.7262 0.9931 -384.9014 -345.3100 -2.0389 -2.1439
0.5554 0.6 2500 0.5851 -0.8047 -1.3484 0.6870 0.5438 2.4885 -0.7806 1.0799 -396.0864 -352.1300 -2.0291 -2.1347
0.5772 0.62 2600 0.5848 -0.8383 -1.4070 0.6850 0.5687 2.6189 -0.8100 1.1309 -401.9416 -355.4966 -2.0291 -2.1349
0.5886 0.65 2700 0.5817 -0.7441 -1.2782 0.6905 0.5341 2.4311 -0.7283 1.0439 -389.0583 -346.0684 -2.0422 -2.1471
0.6359 0.67 2800 0.5827 -0.8038 -1.3683 0.6900 0.5646 2.5875 -0.7823 1.1115 -398.0761 -352.0410 -2.0338 -2.1393
0.5778 0.69 2900 0.5806 -0.7960 -1.3631 0.6940 0.5671 2.5883 -0.7722 1.1082 -397.5573 -351.2641 -2.0323 -2.1377
0.5122 0.72 3000 0.5802 -0.8309 -1.4241 0.6910 0.5932 2.7199 -0.8027 1.1606 -403.6570 -354.7578 -2.0306 -2.1371
0.5337 0.74 3100 0.5811 -0.8622 -1.4716 0.6895 0.6093 2.7945 -0.8310 1.1946 -408.4017 -357.8872 -2.0254 -2.1313
0.5356 0.77 3200 0.5798 -0.8202 -1.4105 0.6910 0.5903 2.6922 -0.7970 1.1512 -402.2947 -353.6847 -2.0290 -2.1344
0.5481 0.79 3300 0.5790 -0.7947 -1.3784 0.6945 0.5838 2.6645 -0.7790 1.1355 -399.0837 -351.1285 -2.0247 -2.1306
0.5216 0.81 3400 0.5802 -0.8580 -1.4671 0.6870 0.6091 2.7829 -0.8238 1.1893 -407.9558 -357.4660 -2.0275 -2.1326
0.5254 0.84 3500 0.5791 -0.8231 -1.4168 0.6905 0.5937 2.7039 -0.7954 1.1546 -402.9213 -353.9702 -2.0331 -2.1377
0.5838 0.86 3600 0.5784 -0.8062 -1.3965 0.6920 0.5903 2.6855 -0.7833 1.1446 -400.8925 -352.2793 -2.0273 -2.1328
0.5567 0.89 3700 0.5784 -0.8036 -1.3925 0.6920 0.5888 2.6776 -0.7817 1.1419 -400.4903 -352.0265 -2.0305 -2.1357
0.5429 0.91 3800 0.5784 -0.8144 -1.4091 0.6935 0.5947 2.7068 -0.7898 1.1538 -402.1539 -353.1044 -2.0261 -2.1316
0.5582 0.93 3900 0.5784 -0.8158 -1.4117 0.6895 0.5959 2.7125 -0.7914 1.1563 -402.4116 -353.2459 -2.0268 -2.1323
0.5487 0.96 4000 0.5783 -0.8121 -1.4060 0.6915 0.5939 2.7037 -0.7879 1.1523 -401.8473 -352.8768 -2.0239 -2.1296
0.5322 0.98 4100 0.5784 -0.8141 -1.4090 0.6920 0.5949 2.7090 -0.7893 1.1546 -402.1446 -353.0736 -2.0279 -2.1333

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2