ironrock's picture
Model save
88901c3 verified
|
raw
history blame
No virus
11.4 kB
metadata
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged
model-index:
  - name: WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.7-DPO
    results: []

WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.7-DPO

This model is a fine-tuned version of Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0923
  • Rewards/chosen: 1.3984
  • Rewards/rejected: -6.4179
  • Rewards/accuracies: 0.9643
  • Rewards/margins: 7.8163
  • Logps/rejected: -264.5786
  • Logps/chosen: -189.8816
  • Logits/rejected: -1.8496
  • Logits/chosen: -1.8101

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 1470
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6781 0.12 30 0.6762 0.0504 0.0157 0.75 0.0347 -243.1332 -194.3750 -1.8308 -1.7951
0.5918 0.24 60 0.5998 0.2476 0.0383 0.7857 0.2093 -243.0578 -193.7174 -1.8333 -1.7975
0.4932 0.37 90 0.5072 0.5622 0.0680 0.8214 0.4942 -242.9590 -192.6691 -1.8364 -1.8004
0.4391 0.49 120 0.4336 0.9734 0.1121 0.7857 0.8613 -242.8120 -191.2982 -1.8413 -1.8051
0.3208 0.61 150 0.3933 1.3961 0.0824 0.7857 1.3137 -242.9110 -189.8893 -1.8492 -1.8130
0.3215 0.73 180 0.3756 1.8483 0.0151 0.7857 1.8332 -243.1354 -188.3820 -1.8562 -1.8194
0.0817 0.86 210 0.3835 2.3139 -0.1849 0.7857 2.4989 -243.8021 -186.8299 -1.8641 -1.8266
0.137 0.98 240 0.4132 2.5979 -0.5021 0.75 3.1001 -244.8594 -185.8831 -1.8722 -1.8343
0.0997 1.1 270 0.4657 2.7384 -1.0053 0.75 3.7438 -246.5367 -185.4148 -1.8816 -1.8430
0.0432 1.22 300 0.5011 2.7041 -1.4771 0.75 4.1812 -248.1093 -185.5293 -1.8884 -1.8495
0.1819 1.35 330 0.4785 2.7004 -1.8249 0.75 4.5253 -249.2688 -185.5418 -1.8878 -1.8487
0.0169 1.47 360 0.4872 2.6643 -2.1577 0.75 4.8220 -250.3781 -185.6619 -1.8907 -1.8510
0.235 1.59 390 0.4886 2.6565 -2.3834 0.75 5.0399 -251.1302 -185.6880 -1.8930 -1.8532
0.7551 1.71 420 0.4380 2.7229 -2.3468 0.75 5.0697 -251.0082 -185.4665 -1.8921 -1.8527
0.134 1.84 450 0.4383 2.6666 -2.5566 0.75 5.2232 -251.7077 -185.6543 -1.8925 -1.8531
0.0662 1.96 480 0.4448 2.5586 -2.9192 0.75 5.4778 -252.9164 -186.0143 -1.8964 -1.8569
0.1093 2.08 510 0.4262 2.5211 -3.0726 0.75 5.5937 -253.4277 -186.1394 -1.8955 -1.8561
0.1557 2.2 540 0.4264 2.3694 -3.4198 0.75 5.7892 -254.5848 -186.6449 -1.8965 -1.8566
0.0962 2.33 570 0.4182 2.2640 -3.7076 0.75 5.9716 -255.5444 -186.9964 -1.8978 -1.8582
0.0437 2.45 600 0.3824 2.2618 -3.7757 0.75 6.0375 -255.7713 -187.0037 -1.8933 -1.8534
0.0278 2.57 630 0.3571 2.3503 -3.7557 0.8571 6.1060 -255.7046 -186.7086 -1.8932 -1.8536
0.2399 2.69 660 0.3313 2.3025 -3.9256 0.8571 6.2281 -256.2710 -186.8678 -1.8909 -1.8512
0.039 2.82 690 0.3131 2.2138 -4.1650 0.8929 6.3789 -257.0691 -187.1635 -1.8906 -1.8510
0.3389 2.94 720 0.2763 2.2605 -4.2160 0.8929 6.4765 -257.2390 -187.0079 -1.8873 -1.8480
0.0154 3.06 750 0.2704 2.2526 -4.3017 0.8929 6.5544 -257.5247 -187.0342 -1.8862 -1.8470
0.021 3.18 780 0.2422 2.2548 -4.3438 0.8929 6.5986 -257.6650 -187.0270 -1.8838 -1.8448
0.0614 3.31 810 0.2144 2.2331 -4.4495 0.8929 6.6826 -258.0172 -187.0992 -1.8805 -1.8417
0.0529 3.43 840 0.2121 2.1562 -4.6740 0.8929 6.8302 -258.7657 -187.3555 -1.8809 -1.8423
0.001 3.55 870 0.2092 2.1034 -4.8454 0.8929 6.9487 -259.3368 -187.5317 -1.8799 -1.8410
0.0284 3.67 900 0.2006 1.9814 -5.1388 0.8929 7.1202 -260.3150 -187.9384 -1.8760 -1.8366
0.0744 3.8 930 0.1813 1.9437 -5.2351 0.8929 7.1788 -260.6358 -188.0639 -1.8733 -1.8339
0.091 3.92 960 0.1722 1.8333 -5.4335 0.8929 7.2668 -261.2973 -188.4319 -1.8707 -1.8313
0.3504 4.04 990 0.1487 1.8678 -5.3589 0.9286 7.2268 -261.0488 -188.3168 -1.8672 -1.8279
0.0071 4.16 1020 0.1403 1.7989 -5.5185 0.9286 7.3173 -261.5805 -188.5468 -1.8637 -1.8243
0.0131 4.29 1050 0.1312 1.8050 -5.5495 0.9286 7.3545 -261.6841 -188.5262 -1.8616 -1.8222
0.0868 4.41 1080 0.1210 1.7626 -5.6284 0.9286 7.3911 -261.9471 -188.6675 -1.8587 -1.8195
0.0041 4.53 1110 0.1206 1.6865 -5.7780 0.9286 7.4645 -262.4456 -188.9213 -1.8566 -1.8173
0.0107 4.65 1140 0.1178 1.6370 -5.8895 0.9643 7.5266 -262.8174 -189.0862 -1.8563 -1.8171
0.0084 4.78 1170 0.1123 1.6107 -5.9365 0.9643 7.5471 -262.9738 -189.1741 -1.8552 -1.8159
0.0049 4.9 1200 0.1083 1.5710 -6.0495 0.9643 7.6206 -263.3507 -189.3061 -1.8545 -1.8151
0.0746 5.02 1230 0.1034 1.5328 -6.1286 0.9643 7.6614 -263.6144 -189.4336 -1.8535 -1.8140
0.0091 5.14 1260 0.1031 1.4764 -6.2562 0.9643 7.7327 -264.0397 -189.6215 -1.8531 -1.8136
0.0526 5.27 1290 0.0997 1.4526 -6.3037 0.9643 7.7564 -264.1981 -189.7009 -1.8528 -1.8133
0.0316 5.39 1320 0.0965 1.4471 -6.3114 0.9643 7.7585 -264.2236 -189.7192 -1.8517 -1.8124
0.0249 5.51 1350 0.0950 1.4370 -6.3384 0.9643 7.7755 -264.3138 -189.7529 -1.8509 -1.8115
0.2078 5.63 1380 0.0937 1.4141 -6.3790 0.9643 7.7931 -264.4489 -189.8293 -1.8504 -1.8111
0.013 5.76 1410 0.0926 1.4237 -6.3666 0.9643 7.7902 -264.4076 -189.7974 -1.8498 -1.8103
0.0194 5.88 1440 0.0923 1.3984 -6.4179 0.9643 7.8163 -264.5786 -189.8816 -1.8496 -1.8101
0.0111 6.0 1470 0.0919 1.3959 -6.4219 0.9643 7.8179 -264.5919 -189.8898 -1.8495 -1.8100

Framework versions

  • PEFT 0.10.0
  • Transformers 4.38.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.18.0
  • Tokenizers 0.15.2