zephyr-7b-dpo-qlora / README.md
dball's picture
Model save
30e4e51 verified
|
raw
history blame
16.5 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
model-index:
  - name: zephyr-7b-dpo-qlora
    results: []

zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5058
  • Rewards/chosen: -2.0144
  • Rewards/rejected: -3.0238
  • Rewards/accuracies: 0.7350
  • Rewards/margins: 1.0093
  • Logps/rejected: -550.9584
  • Logps/chosen: -469.9345
  • Logits/rejected: 1.9679
  • Logits/chosen: 1.2121

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6934 0.01 100 0.6931 0.0002 0.0001 0.5105 0.0001 -248.5731 -268.4692 -2.4383 -2.5261
0.6924 0.03 200 0.6926 0.0014 0.0003 0.5605 0.0011 -248.5511 -268.3451 -2.4368 -2.5247
0.691 0.04 300 0.6907 0.0091 0.0041 0.6440 0.0050 -248.1753 -267.5839 -2.4378 -2.5253
0.6876 0.05 400 0.6845 0.0405 0.0227 0.6580 0.0178 -246.3089 -264.4353 -2.4351 -2.5230
0.6799 0.07 500 0.6707 0.0354 -0.0135 0.6815 0.0489 -249.9276 -264.9495 -2.3755 -2.4660
0.6577 0.08 600 0.6462 -0.1230 -0.2378 0.6750 0.1148 -272.3604 -280.7885 -2.2541 -2.3601
0.6365 0.09 700 0.6345 -0.0856 -0.2362 0.6860 0.1507 -272.2037 -277.0453 -2.2013 -2.3136
0.6519 0.1 800 0.6240 -0.4943 -0.7231 0.6630 0.2287 -320.8872 -317.9223 -2.0482 -2.1835
0.6547 0.12 900 0.6203 -0.5733 -0.8287 0.6695 0.2555 -331.4542 -325.8177 -2.0783 -2.2184
0.5841 0.13 1000 0.6071 -0.5361 -0.8600 0.6820 0.3239 -334.5816 -322.0998 -2.0689 -2.2086
0.5877 0.14 1100 0.5947 -1.1495 -1.6229 0.6855 0.4734 -410.8678 -383.4380 -1.1053 -1.3836
0.5552 0.16 1200 0.5909 -1.4256 -1.8934 0.6880 0.4678 -437.9200 -411.0459 -0.3614 -0.7372
0.5492 0.17 1300 0.5791 -1.4614 -1.9771 0.6935 0.5157 -446.2910 -414.6323 -0.1933 -0.5949
0.5789 0.18 1400 0.5771 -0.8799 -1.3633 0.7035 0.4834 -384.9109 -356.4832 -0.1908 -0.5846
0.5456 0.2 1500 0.5646 -1.1845 -1.7913 0.7035 0.6068 -427.7158 -386.9436 0.3098 -0.1574
0.4722 0.21 1600 0.5598 -1.3242 -1.9424 0.7075 0.6181 -442.8174 -400.9113 0.5395 0.0346
0.5072 0.22 1700 0.5574 -1.5040 -2.1667 0.7060 0.6628 -465.2537 -418.8860 1.0411 0.4657
0.5284 0.24 1800 0.5534 -1.5486 -2.2055 0.7070 0.6568 -469.1293 -423.3542 1.2404 0.6528
0.5623 0.25 1900 0.5625 -1.7106 -2.4247 0.7055 0.7141 -491.0526 -439.5539 0.7808 0.3058
0.6092 0.26 2000 0.5501 -1.0158 -1.6513 0.7085 0.6354 -413.7089 -370.0728 0.5199 0.0079
0.5726 0.27 2100 0.5433 -1.4697 -2.1580 0.7150 0.6884 -464.3842 -415.4569 0.9981 0.4405
0.5323 0.29 2200 0.5483 -1.3173 -2.0886 0.7150 0.7713 -457.4451 -400.2244 1.3533 0.7445
0.5148 0.3 2300 0.5387 -1.3194 -2.0188 0.7275 0.6994 -450.4646 -400.4308 1.1454 0.5107
0.4112 0.31 2400 0.5401 -1.6201 -2.4219 0.7200 0.8018 -490.7723 -430.5040 1.2866 0.6648
0.5246 0.33 2500 0.5413 -2.1278 -2.8964 0.7220 0.7686 -538.2222 -481.2729 1.7388 1.0914
0.5657 0.34 2600 0.5373 -1.6863 -2.4642 0.7200 0.7779 -495.0003 -437.1172 1.6571 0.9886
0.5216 0.35 2700 0.5357 -1.9895 -2.7395 0.7260 0.7500 -522.5278 -467.4365 1.7936 1.1290
0.5865 0.37 2800 0.5351 -2.1007 -2.8103 0.7260 0.7096 -529.6149 -478.5605 1.7565 1.1019
0.5252 0.38 2900 0.5376 -1.5816 -2.4416 0.7205 0.8600 -492.7397 -426.6496 1.5686 0.9108
0.5381 0.39 3000 0.5306 -1.5416 -2.3719 0.7230 0.8303 -485.7741 -422.6485 1.7206 1.0233
0.4587 0.41 3100 0.5222 -1.4511 -2.1850 0.7260 0.7339 -467.0778 -413.6005 1.8445 1.1221
0.5173 0.42 3200 0.5277 -1.3551 -2.1383 0.7260 0.7832 -462.4095 -403.9989 1.6186 0.8981
0.5851 0.43 3300 0.5181 -1.6864 -2.5011 0.7325 0.8148 -498.6931 -437.1258 2.0344 1.2860
0.5811 0.44 3400 0.5166 -1.6007 -2.4386 0.7335 0.8379 -492.4408 -428.5590 1.7238 1.0162
0.4892 0.46 3500 0.5257 -1.4712 -2.3237 0.7280 0.8525 -480.9519 -415.6104 2.0709 1.3014
0.5438 0.47 3600 0.5252 -1.5967 -2.4449 0.7275 0.8482 -493.0664 -428.1592 2.2020 1.4150
0.5677 0.48 3700 0.5152 -1.9726 -2.8128 0.7275 0.8402 -529.8630 -465.7504 2.4678 1.6843
0.5471 0.5 3800 0.5240 -2.0731 -3.0300 0.7255 0.9569 -551.5833 -475.7978 2.2022 1.4352
0.5193 0.51 3900 0.5185 -2.1713 -3.1118 0.7340 0.9405 -559.7596 -485.6194 2.1469 1.3990
0.5764 0.52 4000 0.5177 -2.0057 -2.9735 0.7310 0.9678 -545.9298 -469.0576 1.8653 1.1192
0.504 0.54 4100 0.5180 -1.8237 -2.7453 0.7270 0.9217 -523.1135 -450.8565 1.7948 1.0344
0.4846 0.55 4200 0.5168 -2.1214 -3.0448 0.7260 0.9234 -553.0635 -480.6317 2.1064 1.3329
0.426 0.56 4300 0.5096 -2.0142 -2.9490 0.7325 0.9349 -543.4855 -469.9074 2.0377 1.2900
0.5289 0.58 4400 0.5143 -1.9624 -2.9368 0.7260 0.9744 -542.2659 -464.7332 1.7669 1.0286
0.4542 0.59 4500 0.5102 -1.9643 -2.9280 0.7335 0.9637 -541.3861 -464.9223 1.8775 1.1395
0.4839 0.6 4600 0.5094 -2.0037 -2.9783 0.7305 0.9747 -546.4150 -468.8564 1.8858 1.1472
0.5562 0.62 4700 0.5076 -2.0260 -2.9819 0.7340 0.9559 -546.7677 -471.0873 1.9384 1.1999
0.4964 0.63 4800 0.5078 -2.1724 -3.1285 0.7335 0.9561 -561.4290 -485.7305 2.1538 1.3968
0.4879 0.64 4900 0.5125 -2.2107 -3.2298 0.7310 1.0191 -571.5599 -489.5623 2.1324 1.3802
0.4916 0.65 5000 0.5087 -2.0966 -3.1006 0.7300 1.0041 -558.6430 -478.1451 2.1161 1.3780
0.5806 0.67 5100 0.5089 -2.2279 -3.2378 0.7305 1.0099 -572.3604 -491.2838 2.0897 1.3595
0.5027 0.68 5200 0.5038 -1.8962 -2.8326 0.7375 0.9364 -531.8434 -458.1095 1.8014 1.0714
0.4554 0.69 5300 0.5052 -1.9550 -2.9208 0.7330 0.9658 -540.6600 -463.9870 1.8905 1.1555
0.4521 0.71 5400 0.5039 -1.9912 -2.9472 0.7370 0.9559 -543.2982 -467.6124 1.8437 1.1076
0.5869 0.72 5500 0.5054 -2.1704 -3.1637 0.7360 0.9933 -564.9521 -485.5281 1.8865 1.1574
0.5924 0.73 5600 0.5064 -1.8180 -2.7843 0.7320 0.9663 -527.0139 -450.2935 1.5325 0.8215
0.4275 0.75 5700 0.5055 -2.0070 -3.0130 0.7340 1.0060 -549.8819 -469.1932 1.7229 0.9960
0.4746 0.76 5800 0.5072 -2.2069 -3.2470 0.7300 1.0401 -573.2806 -489.1825 1.8507 1.1168
0.5033 0.77 5900 0.5061 -1.8962 -2.8744 0.7275 0.9782 -536.0162 -458.1062 1.7071 0.9675
0.4517 0.79 6000 0.5105 -1.7324 -2.6813 0.7265 0.9489 -516.7132 -441.7279 1.5613 0.8156
0.5071 0.8 6100 0.5116 -1.8634 -2.8617 0.7275 0.9983 -534.7506 -454.8272 1.6895 0.9370
0.6455 0.81 6200 0.5110 -1.8796 -2.8743 0.7250 0.9947 -536.0126 -456.4508 1.7120 0.9542
0.4796 0.82 6300 0.5112 -1.9250 -2.9447 0.7260 1.0197 -543.0519 -460.9879 1.7784 1.0203
0.5568 0.84 6400 0.5086 -1.9539 -2.9695 0.7275 1.0156 -545.5328 -463.8810 1.8764 1.1152
0.4335 0.85 6500 0.5067 -2.0048 -3.0192 0.7295 1.0144 -550.4982 -468.9681 1.9425 1.1822
0.5263 0.86 6600 0.5066 -1.9682 -2.9769 0.7310 1.0087 -546.2759 -465.3099 1.9390 1.1806
0.5263 0.88 6700 0.5066 -1.9719 -2.9803 0.7320 1.0084 -546.6119 -465.6784 1.9366 1.1794
0.4939 0.89 6800 0.5063 -2.0205 -3.0328 0.7325 1.0123 -551.8629 -470.5374 1.9795 1.2238
0.5763 0.9 6900 0.5060 -2.0098 -3.0191 0.7330 1.0092 -550.4863 -469.4713 1.9579 1.2027
0.5062 0.92 7000 0.5059 -2.0030 -3.0107 0.7320 1.0077 -549.6514 -468.7946 1.9574 1.2018
0.4432 0.93 7100 0.5059 -2.0132 -3.0218 0.7330 1.0085 -550.7594 -469.8141 1.9675 1.2115
0.5294 0.94 7200 0.5059 -2.0141 -3.0230 0.7315 1.0089 -550.8820 -469.9014 1.9679 1.2123
0.4488 0.96 7300 0.5058 -2.0144 -3.0239 0.7320 1.0095 -550.9682 -469.9289 1.9688 1.2130
0.4747 0.97 7400 0.5057 -2.0142 -3.0234 0.7325 1.0092 -550.9178 -469.9052 1.9679 1.2122
0.4494 0.98 7500 0.5058 -2.0144 -3.0238 0.7350 1.0093 -550.9584 -469.9345 1.9679 1.2121
0.5319 0.99 7600 0.5058 -2.0144 -3.0238 0.7350 1.0093 -550.9584 -469.9345 1.9679 1.2121

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.0