metadata
license: apache-2.0
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: zephyr-7b-dpo-qlora
results: []
zephyr-7b-dpo-qlora
This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.5058
- Rewards/chosen: -2.0144
- Rewards/rejected: -3.0238
- Rewards/accuracies: 0.7350
- Rewards/margins: 1.0093
- Logps/rejected: -550.9584
- Logps/chosen: -469.9345
- Logits/rejected: 1.9679
- Logits/chosen: 1.2121
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6934 | 0.01 | 100 | 0.6931 | 0.0002 | 0.0001 | 0.5105 | 0.0001 | -248.5731 | -268.4692 | -2.4383 | -2.5261 |
0.6924 | 0.03 | 200 | 0.6926 | 0.0014 | 0.0003 | 0.5605 | 0.0011 | -248.5511 | -268.3451 | -2.4368 | -2.5247 |
0.691 | 0.04 | 300 | 0.6907 | 0.0091 | 0.0041 | 0.6440 | 0.0050 | -248.1753 | -267.5839 | -2.4378 | -2.5253 |
0.6876 | 0.05 | 400 | 0.6845 | 0.0405 | 0.0227 | 0.6580 | 0.0178 | -246.3089 | -264.4353 | -2.4351 | -2.5230 |
0.6799 | 0.07 | 500 | 0.6707 | 0.0354 | -0.0135 | 0.6815 | 0.0489 | -249.9276 | -264.9495 | -2.3755 | -2.4660 |
0.6577 | 0.08 | 600 | 0.6462 | -0.1230 | -0.2378 | 0.6750 | 0.1148 | -272.3604 | -280.7885 | -2.2541 | -2.3601 |
0.6365 | 0.09 | 700 | 0.6345 | -0.0856 | -0.2362 | 0.6860 | 0.1507 | -272.2037 | -277.0453 | -2.2013 | -2.3136 |
0.6519 | 0.1 | 800 | 0.6240 | -0.4943 | -0.7231 | 0.6630 | 0.2287 | -320.8872 | -317.9223 | -2.0482 | -2.1835 |
0.6547 | 0.12 | 900 | 0.6203 | -0.5733 | -0.8287 | 0.6695 | 0.2555 | -331.4542 | -325.8177 | -2.0783 | -2.2184 |
0.5841 | 0.13 | 1000 | 0.6071 | -0.5361 | -0.8600 | 0.6820 | 0.3239 | -334.5816 | -322.0998 | -2.0689 | -2.2086 |
0.5877 | 0.14 | 1100 | 0.5947 | -1.1495 | -1.6229 | 0.6855 | 0.4734 | -410.8678 | -383.4380 | -1.1053 | -1.3836 |
0.5552 | 0.16 | 1200 | 0.5909 | -1.4256 | -1.8934 | 0.6880 | 0.4678 | -437.9200 | -411.0459 | -0.3614 | -0.7372 |
0.5492 | 0.17 | 1300 | 0.5791 | -1.4614 | -1.9771 | 0.6935 | 0.5157 | -446.2910 | -414.6323 | -0.1933 | -0.5949 |
0.5789 | 0.18 | 1400 | 0.5771 | -0.8799 | -1.3633 | 0.7035 | 0.4834 | -384.9109 | -356.4832 | -0.1908 | -0.5846 |
0.5456 | 0.2 | 1500 | 0.5646 | -1.1845 | -1.7913 | 0.7035 | 0.6068 | -427.7158 | -386.9436 | 0.3098 | -0.1574 |
0.4722 | 0.21 | 1600 | 0.5598 | -1.3242 | -1.9424 | 0.7075 | 0.6181 | -442.8174 | -400.9113 | 0.5395 | 0.0346 |
0.5072 | 0.22 | 1700 | 0.5574 | -1.5040 | -2.1667 | 0.7060 | 0.6628 | -465.2537 | -418.8860 | 1.0411 | 0.4657 |
0.5284 | 0.24 | 1800 | 0.5534 | -1.5486 | -2.2055 | 0.7070 | 0.6568 | -469.1293 | -423.3542 | 1.2404 | 0.6528 |
0.5623 | 0.25 | 1900 | 0.5625 | -1.7106 | -2.4247 | 0.7055 | 0.7141 | -491.0526 | -439.5539 | 0.7808 | 0.3058 |
0.6092 | 0.26 | 2000 | 0.5501 | -1.0158 | -1.6513 | 0.7085 | 0.6354 | -413.7089 | -370.0728 | 0.5199 | 0.0079 |
0.5726 | 0.27 | 2100 | 0.5433 | -1.4697 | -2.1580 | 0.7150 | 0.6884 | -464.3842 | -415.4569 | 0.9981 | 0.4405 |
0.5323 | 0.29 | 2200 | 0.5483 | -1.3173 | -2.0886 | 0.7150 | 0.7713 | -457.4451 | -400.2244 | 1.3533 | 0.7445 |
0.5148 | 0.3 | 2300 | 0.5387 | -1.3194 | -2.0188 | 0.7275 | 0.6994 | -450.4646 | -400.4308 | 1.1454 | 0.5107 |
0.4112 | 0.31 | 2400 | 0.5401 | -1.6201 | -2.4219 | 0.7200 | 0.8018 | -490.7723 | -430.5040 | 1.2866 | 0.6648 |
0.5246 | 0.33 | 2500 | 0.5413 | -2.1278 | -2.8964 | 0.7220 | 0.7686 | -538.2222 | -481.2729 | 1.7388 | 1.0914 |
0.5657 | 0.34 | 2600 | 0.5373 | -1.6863 | -2.4642 | 0.7200 | 0.7779 | -495.0003 | -437.1172 | 1.6571 | 0.9886 |
0.5216 | 0.35 | 2700 | 0.5357 | -1.9895 | -2.7395 | 0.7260 | 0.7500 | -522.5278 | -467.4365 | 1.7936 | 1.1290 |
0.5865 | 0.37 | 2800 | 0.5351 | -2.1007 | -2.8103 | 0.7260 | 0.7096 | -529.6149 | -478.5605 | 1.7565 | 1.1019 |
0.5252 | 0.38 | 2900 | 0.5376 | -1.5816 | -2.4416 | 0.7205 | 0.8600 | -492.7397 | -426.6496 | 1.5686 | 0.9108 |
0.5381 | 0.39 | 3000 | 0.5306 | -1.5416 | -2.3719 | 0.7230 | 0.8303 | -485.7741 | -422.6485 | 1.7206 | 1.0233 |
0.4587 | 0.41 | 3100 | 0.5222 | -1.4511 | -2.1850 | 0.7260 | 0.7339 | -467.0778 | -413.6005 | 1.8445 | 1.1221 |
0.5173 | 0.42 | 3200 | 0.5277 | -1.3551 | -2.1383 | 0.7260 | 0.7832 | -462.4095 | -403.9989 | 1.6186 | 0.8981 |
0.5851 | 0.43 | 3300 | 0.5181 | -1.6864 | -2.5011 | 0.7325 | 0.8148 | -498.6931 | -437.1258 | 2.0344 | 1.2860 |
0.5811 | 0.44 | 3400 | 0.5166 | -1.6007 | -2.4386 | 0.7335 | 0.8379 | -492.4408 | -428.5590 | 1.7238 | 1.0162 |
0.4892 | 0.46 | 3500 | 0.5257 | -1.4712 | -2.3237 | 0.7280 | 0.8525 | -480.9519 | -415.6104 | 2.0709 | 1.3014 |
0.5438 | 0.47 | 3600 | 0.5252 | -1.5967 | -2.4449 | 0.7275 | 0.8482 | -493.0664 | -428.1592 | 2.2020 | 1.4150 |
0.5677 | 0.48 | 3700 | 0.5152 | -1.9726 | -2.8128 | 0.7275 | 0.8402 | -529.8630 | -465.7504 | 2.4678 | 1.6843 |
0.5471 | 0.5 | 3800 | 0.5240 | -2.0731 | -3.0300 | 0.7255 | 0.9569 | -551.5833 | -475.7978 | 2.2022 | 1.4352 |
0.5193 | 0.51 | 3900 | 0.5185 | -2.1713 | -3.1118 | 0.7340 | 0.9405 | -559.7596 | -485.6194 | 2.1469 | 1.3990 |
0.5764 | 0.52 | 4000 | 0.5177 | -2.0057 | -2.9735 | 0.7310 | 0.9678 | -545.9298 | -469.0576 | 1.8653 | 1.1192 |
0.504 | 0.54 | 4100 | 0.5180 | -1.8237 | -2.7453 | 0.7270 | 0.9217 | -523.1135 | -450.8565 | 1.7948 | 1.0344 |
0.4846 | 0.55 | 4200 | 0.5168 | -2.1214 | -3.0448 | 0.7260 | 0.9234 | -553.0635 | -480.6317 | 2.1064 | 1.3329 |
0.426 | 0.56 | 4300 | 0.5096 | -2.0142 | -2.9490 | 0.7325 | 0.9349 | -543.4855 | -469.9074 | 2.0377 | 1.2900 |
0.5289 | 0.58 | 4400 | 0.5143 | -1.9624 | -2.9368 | 0.7260 | 0.9744 | -542.2659 | -464.7332 | 1.7669 | 1.0286 |
0.4542 | 0.59 | 4500 | 0.5102 | -1.9643 | -2.9280 | 0.7335 | 0.9637 | -541.3861 | -464.9223 | 1.8775 | 1.1395 |
0.4839 | 0.6 | 4600 | 0.5094 | -2.0037 | -2.9783 | 0.7305 | 0.9747 | -546.4150 | -468.8564 | 1.8858 | 1.1472 |
0.5562 | 0.62 | 4700 | 0.5076 | -2.0260 | -2.9819 | 0.7340 | 0.9559 | -546.7677 | -471.0873 | 1.9384 | 1.1999 |
0.4964 | 0.63 | 4800 | 0.5078 | -2.1724 | -3.1285 | 0.7335 | 0.9561 | -561.4290 | -485.7305 | 2.1538 | 1.3968 |
0.4879 | 0.64 | 4900 | 0.5125 | -2.2107 | -3.2298 | 0.7310 | 1.0191 | -571.5599 | -489.5623 | 2.1324 | 1.3802 |
0.4916 | 0.65 | 5000 | 0.5087 | -2.0966 | -3.1006 | 0.7300 | 1.0041 | -558.6430 | -478.1451 | 2.1161 | 1.3780 |
0.5806 | 0.67 | 5100 | 0.5089 | -2.2279 | -3.2378 | 0.7305 | 1.0099 | -572.3604 | -491.2838 | 2.0897 | 1.3595 |
0.5027 | 0.68 | 5200 | 0.5038 | -1.8962 | -2.8326 | 0.7375 | 0.9364 | -531.8434 | -458.1095 | 1.8014 | 1.0714 |
0.4554 | 0.69 | 5300 | 0.5052 | -1.9550 | -2.9208 | 0.7330 | 0.9658 | -540.6600 | -463.9870 | 1.8905 | 1.1555 |
0.4521 | 0.71 | 5400 | 0.5039 | -1.9912 | -2.9472 | 0.7370 | 0.9559 | -543.2982 | -467.6124 | 1.8437 | 1.1076 |
0.5869 | 0.72 | 5500 | 0.5054 | -2.1704 | -3.1637 | 0.7360 | 0.9933 | -564.9521 | -485.5281 | 1.8865 | 1.1574 |
0.5924 | 0.73 | 5600 | 0.5064 | -1.8180 | -2.7843 | 0.7320 | 0.9663 | -527.0139 | -450.2935 | 1.5325 | 0.8215 |
0.4275 | 0.75 | 5700 | 0.5055 | -2.0070 | -3.0130 | 0.7340 | 1.0060 | -549.8819 | -469.1932 | 1.7229 | 0.9960 |
0.4746 | 0.76 | 5800 | 0.5072 | -2.2069 | -3.2470 | 0.7300 | 1.0401 | -573.2806 | -489.1825 | 1.8507 | 1.1168 |
0.5033 | 0.77 | 5900 | 0.5061 | -1.8962 | -2.8744 | 0.7275 | 0.9782 | -536.0162 | -458.1062 | 1.7071 | 0.9675 |
0.4517 | 0.79 | 6000 | 0.5105 | -1.7324 | -2.6813 | 0.7265 | 0.9489 | -516.7132 | -441.7279 | 1.5613 | 0.8156 |
0.5071 | 0.8 | 6100 | 0.5116 | -1.8634 | -2.8617 | 0.7275 | 0.9983 | -534.7506 | -454.8272 | 1.6895 | 0.9370 |
0.6455 | 0.81 | 6200 | 0.5110 | -1.8796 | -2.8743 | 0.7250 | 0.9947 | -536.0126 | -456.4508 | 1.7120 | 0.9542 |
0.4796 | 0.82 | 6300 | 0.5112 | -1.9250 | -2.9447 | 0.7260 | 1.0197 | -543.0519 | -460.9879 | 1.7784 | 1.0203 |
0.5568 | 0.84 | 6400 | 0.5086 | -1.9539 | -2.9695 | 0.7275 | 1.0156 | -545.5328 | -463.8810 | 1.8764 | 1.1152 |
0.4335 | 0.85 | 6500 | 0.5067 | -2.0048 | -3.0192 | 0.7295 | 1.0144 | -550.4982 | -468.9681 | 1.9425 | 1.1822 |
0.5263 | 0.86 | 6600 | 0.5066 | -1.9682 | -2.9769 | 0.7310 | 1.0087 | -546.2759 | -465.3099 | 1.9390 | 1.1806 |
0.5263 | 0.88 | 6700 | 0.5066 | -1.9719 | -2.9803 | 0.7320 | 1.0084 | -546.6119 | -465.6784 | 1.9366 | 1.1794 |
0.4939 | 0.89 | 6800 | 0.5063 | -2.0205 | -3.0328 | 0.7325 | 1.0123 | -551.8629 | -470.5374 | 1.9795 | 1.2238 |
0.5763 | 0.9 | 6900 | 0.5060 | -2.0098 | -3.0191 | 0.7330 | 1.0092 | -550.4863 | -469.4713 | 1.9579 | 1.2027 |
0.5062 | 0.92 | 7000 | 0.5059 | -2.0030 | -3.0107 | 0.7320 | 1.0077 | -549.6514 | -468.7946 | 1.9574 | 1.2018 |
0.4432 | 0.93 | 7100 | 0.5059 | -2.0132 | -3.0218 | 0.7330 | 1.0085 | -550.7594 | -469.8141 | 1.9675 | 1.2115 |
0.5294 | 0.94 | 7200 | 0.5059 | -2.0141 | -3.0230 | 0.7315 | 1.0089 | -550.8820 | -469.9014 | 1.9679 | 1.2123 |
0.4488 | 0.96 | 7300 | 0.5058 | -2.0144 | -3.0239 | 0.7320 | 1.0095 | -550.9682 | -469.9289 | 1.9688 | 1.2130 |
0.4747 | 0.97 | 7400 | 0.5057 | -2.0142 | -3.0234 | 0.7325 | 1.0092 | -550.9178 | -469.9052 | 1.9679 | 1.2122 |
0.4494 | 0.98 | 7500 | 0.5058 | -2.0144 | -3.0238 | 0.7350 | 1.0093 | -550.9584 | -469.9345 | 1.9679 | 1.2121 |
0.5319 | 0.99 | 7600 | 0.5058 | -2.0144 | -3.0238 | 0.7350 | 1.0093 | -550.9584 | -469.9345 | 1.9679 | 1.2121 |
Framework versions
- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.14.6
- Tokenizers 0.15.0