metadata
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
base_model: openbmb/Eurus-7b-sft
datasets:
- generation/UF
- generation/UFfull
model-index:
- name: eurus-dpo-qlora-uf-ours-uffull-5e-7
results: []
eurus-dpo-qlora-uf-ours-uffull-5e-7
This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF and the generation/UFfull datasets. It achieves the following results on the evaluation set:
- Loss: 0.5784
- Rewards/chosen: -0.8137
- Rewards/rejected: -1.4080
- Rewards/accuracies: 0.6890
- Rewards/margins: 0.5943
- Rewards/margins Max: 2.7059
- Rewards/margins Min: -0.7893
- Rewards/margins Std: 1.1535
- Logps/rejected: -402.0422
- Logps/chosen: -353.0313
- Logits/rejected: -2.0261
- Logits/chosen: -2.1316
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6928 | 0.02 | 100 | 0.6928 | -0.0005 | -0.0012 | 0.5665 | 0.0007 | 0.0070 | -0.0053 | 0.0040 | -261.3662 | -271.7138 | -2.1756 | -2.2858 |
0.6908 | 0.05 | 200 | 0.6916 | -0.0048 | -0.0080 | 0.6020 | 0.0032 | 0.0263 | -0.0174 | 0.0141 | -262.0459 | -272.1466 | -2.1761 | -2.2858 |
0.6872 | 0.07 | 300 | 0.6890 | -0.0166 | -0.0256 | 0.6070 | 0.0090 | 0.0722 | -0.0467 | 0.0382 | -263.7999 | -273.3193 | -2.1667 | -2.2764 |
0.6838 | 0.1 | 400 | 0.6844 | -0.0379 | -0.0580 | 0.6020 | 0.0201 | 0.1566 | -0.0970 | 0.0816 | -267.0469 | -275.4539 | -2.1606 | -2.2694 |
0.6798 | 0.12 | 500 | 0.6789 | -0.0627 | -0.0970 | 0.6175 | 0.0342 | 0.2449 | -0.1414 | 0.1248 | -270.9397 | -277.9361 | -2.1579 | -2.2653 |
0.685 | 0.14 | 600 | 0.6730 | -0.1084 | -0.1602 | 0.6210 | 0.0518 | 0.3502 | -0.1978 | 0.1782 | -277.2612 | -282.5006 | -2.1431 | -2.2503 |
0.659 | 0.17 | 700 | 0.6673 | -0.1744 | -0.2467 | 0.6230 | 0.0722 | 0.4794 | -0.2753 | 0.2460 | -285.9120 | -289.1078 | -2.1416 | -2.2474 |
0.6583 | 0.19 | 800 | 0.6609 | -0.2333 | -0.3274 | 0.6190 | 0.0941 | 0.5863 | -0.3297 | 0.2990 | -293.9805 | -294.9917 | -2.1391 | -2.2443 |
0.6461 | 0.22 | 900 | 0.6549 | -0.2805 | -0.3960 | 0.6260 | 0.1155 | 0.6830 | -0.3766 | 0.3465 | -300.8446 | -299.7142 | -2.1259 | -2.2312 |
0.6294 | 0.24 | 1000 | 0.6490 | -0.3292 | -0.4685 | 0.6320 | 0.1393 | 0.7941 | -0.4289 | 0.4003 | -308.0930 | -304.5846 | -2.1230 | -2.2277 |
0.6195 | 0.26 | 1100 | 0.6439 | -0.4060 | -0.5737 | 0.6300 | 0.1676 | 0.9328 | -0.4941 | 0.4674 | -318.6076 | -312.2675 | -2.1223 | -2.2261 |
0.5908 | 0.29 | 1200 | 0.6383 | -0.4424 | -0.6330 | 0.6365 | 0.1906 | 1.0210 | -0.5224 | 0.5060 | -324.5404 | -315.8986 | -2.1168 | -2.2197 |
0.6 | 0.31 | 1300 | 0.6348 | -0.6416 | -0.8848 | 0.6410 | 0.2432 | 1.3117 | -0.6516 | 0.6444 | -349.7195 | -335.8222 | -2.0950 | -2.1960 |
0.6562 | 0.34 | 1400 | 0.6244 | -0.5677 | -0.8319 | 0.6445 | 0.2643 | 1.3351 | -0.6114 | 0.6396 | -344.4360 | -328.4299 | -2.0991 | -2.2013 |
0.6223 | 0.36 | 1500 | 0.6182 | -0.6900 | -1.0011 | 0.6550 | 0.3111 | 1.5452 | -0.6698 | 0.7297 | -361.3520 | -340.6633 | -2.0745 | -2.1758 |
0.5927 | 0.38 | 1600 | 0.6113 | -0.7472 | -1.1111 | 0.6620 | 0.3639 | 1.7606 | -0.7176 | 0.8176 | -372.3524 | -346.3848 | -2.0586 | -2.1605 |
0.5646 | 0.41 | 1700 | 0.6104 | -0.9327 | -1.3738 | 0.6680 | 0.4412 | 2.1427 | -0.8518 | 0.9847 | -398.6274 | -364.9301 | -2.0408 | -2.1421 |
0.5765 | 0.43 | 1800 | 0.5997 | -0.7005 | -1.1096 | 0.6710 | 0.4091 | 1.9021 | -0.7012 | 0.8613 | -372.1998 | -341.7125 | -2.0577 | -2.1618 |
0.6009 | 0.45 | 1900 | 0.5961 | -0.7594 | -1.2075 | 0.6705 | 0.4481 | 2.0697 | -0.7257 | 0.9258 | -381.9936 | -347.6074 | -2.0498 | -2.1542 |
0.6246 | 0.48 | 2000 | 0.5927 | -0.7136 | -1.1673 | 0.6855 | 0.4536 | 2.0828 | -0.6960 | 0.9188 | -377.9705 | -343.0267 | -2.0560 | -2.1615 |
0.5758 | 0.5 | 2100 | 0.5903 | -0.7092 | -1.1658 | 0.6865 | 0.4566 | 2.0892 | -0.6923 | 0.9199 | -377.8206 | -342.5804 | -2.0496 | -2.1544 |
0.5821 | 0.53 | 2200 | 0.5914 | -0.8181 | -1.3146 | 0.6770 | 0.4965 | 2.2775 | -0.7770 | 1.0093 | -392.7020 | -353.4724 | -2.0336 | -2.1376 |
0.5703 | 0.55 | 2300 | 0.5908 | -0.8387 | -1.3505 | 0.6865 | 0.5118 | 2.3511 | -0.7905 | 1.0381 | -396.2970 | -355.5333 | -2.0337 | -2.1371 |
0.5852 | 0.57 | 2400 | 0.5861 | -0.7365 | -1.2366 | 0.6840 | 0.5001 | 2.2801 | -0.7262 | 0.9931 | -384.9014 | -345.3100 | -2.0389 | -2.1439 |
0.5554 | 0.6 | 2500 | 0.5851 | -0.8047 | -1.3484 | 0.6870 | 0.5438 | 2.4885 | -0.7806 | 1.0799 | -396.0864 | -352.1300 | -2.0291 | -2.1347 |
0.5772 | 0.62 | 2600 | 0.5848 | -0.8383 | -1.4070 | 0.6850 | 0.5687 | 2.6189 | -0.8100 | 1.1309 | -401.9416 | -355.4966 | -2.0291 | -2.1349 |
0.5886 | 0.65 | 2700 | 0.5817 | -0.7441 | -1.2782 | 0.6905 | 0.5341 | 2.4311 | -0.7283 | 1.0439 | -389.0583 | -346.0684 | -2.0422 | -2.1471 |
0.6359 | 0.67 | 2800 | 0.5827 | -0.8038 | -1.3683 | 0.6900 | 0.5646 | 2.5875 | -0.7823 | 1.1115 | -398.0761 | -352.0410 | -2.0338 | -2.1393 |
0.5778 | 0.69 | 2900 | 0.5806 | -0.7960 | -1.3631 | 0.6940 | 0.5671 | 2.5883 | -0.7722 | 1.1082 | -397.5573 | -351.2641 | -2.0323 | -2.1377 |
0.5122 | 0.72 | 3000 | 0.5802 | -0.8309 | -1.4241 | 0.6910 | 0.5932 | 2.7199 | -0.8027 | 1.1606 | -403.6570 | -354.7578 | -2.0306 | -2.1371 |
0.5337 | 0.74 | 3100 | 0.5811 | -0.8622 | -1.4716 | 0.6895 | 0.6093 | 2.7945 | -0.8310 | 1.1946 | -408.4017 | -357.8872 | -2.0254 | -2.1313 |
0.5356 | 0.77 | 3200 | 0.5798 | -0.8202 | -1.4105 | 0.6910 | 0.5903 | 2.6922 | -0.7970 | 1.1512 | -402.2947 | -353.6847 | -2.0290 | -2.1344 |
0.5481 | 0.79 | 3300 | 0.5790 | -0.7947 | -1.3784 | 0.6945 | 0.5838 | 2.6645 | -0.7790 | 1.1355 | -399.0837 | -351.1285 | -2.0247 | -2.1306 |
0.5216 | 0.81 | 3400 | 0.5802 | -0.8580 | -1.4671 | 0.6870 | 0.6091 | 2.7829 | -0.8238 | 1.1893 | -407.9558 | -357.4660 | -2.0275 | -2.1326 |
0.5254 | 0.84 | 3500 | 0.5791 | -0.8231 | -1.4168 | 0.6905 | 0.5937 | 2.7039 | -0.7954 | 1.1546 | -402.9213 | -353.9702 | -2.0331 | -2.1377 |
0.5838 | 0.86 | 3600 | 0.5784 | -0.8062 | -1.3965 | 0.6920 | 0.5903 | 2.6855 | -0.7833 | 1.1446 | -400.8925 | -352.2793 | -2.0273 | -2.1328 |
0.5567 | 0.89 | 3700 | 0.5784 | -0.8036 | -1.3925 | 0.6920 | 0.5888 | 2.6776 | -0.7817 | 1.1419 | -400.4903 | -352.0265 | -2.0305 | -2.1357 |
0.5429 | 0.91 | 3800 | 0.5784 | -0.8144 | -1.4091 | 0.6935 | 0.5947 | 2.7068 | -0.7898 | 1.1538 | -402.1539 | -353.1044 | -2.0261 | -2.1316 |
0.5582 | 0.93 | 3900 | 0.5784 | -0.8158 | -1.4117 | 0.6895 | 0.5959 | 2.7125 | -0.7914 | 1.1563 | -402.4116 | -353.2459 | -2.0268 | -2.1323 |
0.5487 | 0.96 | 4000 | 0.5783 | -0.8121 | -1.4060 | 0.6915 | 0.5939 | 2.7037 | -0.7879 | 1.1523 | -401.8473 | -352.8768 | -2.0239 | -2.1296 |
0.5322 | 0.98 | 4100 | 0.5784 | -0.8141 | -1.4090 | 0.6920 | 0.5949 | 2.7090 | -0.7893 | 1.1546 | -402.1446 | -353.0736 | -2.0279 | -2.1333 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2