zephyr-7b-dpo-qlora
This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4873
- Rewards/chosen: -2.9667
- Rewards/rejected: -4.1000
- Rewards/accuracies: 0.7445
- Rewards/margins: 1.1333
- Logps/rejected: -654.6072
- Logps/chosen: -561.3217
- Logits/rejected: -0.9450
- Logits/chosen: -1.0724
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6819 | 0.03 | 100 | -2.0959 | -1.9565 | -259.6472 | -241.9029 | 0.6822 | 0.6545 | 0.0500 | 0.0230 | 0.0271 |
0.6548 | 0.05 | 200 | 0.6500 | -0.1489 | -0.2515 | 0.6780 | 0.1027 | -269.7628 | -279.5373 | -1.9329 | -2.0695 |
0.6084 | 0.08 | 300 | 0.6213 | -0.2956 | -0.4998 | 0.6810 | 0.2042 | -294.5921 | -294.2169 | -1.8771 | -2.0114 |
0.6237 | 0.1 | 400 | 0.6039 | -0.4538 | -0.7401 | 0.6935 | 0.2863 | -318.6170 | -310.0349 | -1.8367 | -1.9656 |
0.5534 | 0.13 | 500 | 0.5692 | -0.9154 | -1.3927 | 0.7050 | 0.4773 | -383.8828 | -356.1946 | -1.5403 | -1.6712 |
0.5613 | 0.16 | 600 | 0.5659 | -0.8123 | -1.3218 | 0.7025 | 0.5095 | -376.7896 | -345.8830 | -1.3701 | -1.5049 |
0.5139 | 0.18 | 700 | 0.5572 | -2.6368 | -3.4670 | 0.7145 | 0.8302 | -591.3087 | -528.3278 | -0.8924 | -1.0174 |
0.5184 | 0.21 | 800 | 0.5374 | -1.4908 | -2.1870 | 0.7160 | 0.6962 | -463.3091 | -413.7339 | -1.1141 | -1.2460 |
0.5211 | 0.24 | 900 | 0.5332 | -2.5430 | -3.3947 | 0.7180 | 0.8518 | -584.0806 | -518.9495 | -0.8116 | -0.9341 |
0.5553 | 0.26 | 1000 | 0.5178 | -2.1745 | -3.0424 | 0.7315 | 0.8679 | -548.8491 | -482.0993 | -0.8557 | -0.9813 |
0.5994 | 0.29 | 1100 | 0.5207 | -2.5002 | -3.3276 | 0.7300 | 0.8275 | -577.3698 | -514.6677 | -0.7615 | -0.8896 |
0.5976 | 0.31 | 1200 | 0.5098 | -2.1833 | -2.9905 | 0.7365 | 0.8072 | -543.6604 | -482.9834 | -0.8350 | -0.9596 |
0.5237 | 0.34 | 1300 | 0.5166 | -3.0973 | -4.1628 | 0.7350 | 1.0654 | -660.8850 | -574.3862 | -0.7072 | -0.8259 |
0.516 | 0.37 | 1400 | 0.5108 | -2.1009 | -3.0663 | 0.7350 | 0.9654 | -551.2367 | -474.7425 | -0.7865 | -0.9128 |
0.4593 | 0.39 | 1500 | 0.5174 | -2.3167 | -3.4254 | 0.7305 | 1.1088 | -587.1506 | -496.3185 | -0.8903 | -1.0211 |
0.5545 | 0.42 | 1600 | 0.5032 | -2.9938 | -4.0820 | 0.7370 | 1.0882 | -652.8123 | -564.0355 | -0.8801 | -1.0082 |
0.5425 | 0.44 | 1700 | 0.4996 | -3.3496 | -4.4061 | 0.7405 | 1.0565 | -685.2187 | -599.6096 | -0.8382 | -0.9686 |
0.4825 | 0.47 | 1800 | 0.5037 | -3.0446 | -4.1288 | 0.7380 | 1.0842 | -657.4884 | -569.1091 | -0.8738 | -1.0006 |
0.4455 | 0.5 | 1900 | 0.4962 | -3.0223 | -4.1482 | 0.7420 | 1.1259 | -659.4305 | -566.8840 | -0.8910 | -1.0214 |
0.4817 | 0.52 | 2000 | 0.4974 | -3.5987 | -4.6648 | 0.7470 | 1.0660 | -711.0853 | -624.5250 | -0.8139 | -0.9428 |
0.5079 | 0.55 | 2100 | 0.4923 | -3.1751 | -4.2293 | 0.7520 | 1.0542 | -667.5426 | -582.1657 | -0.8739 | -1.0031 |
0.477 | 0.58 | 2200 | 0.4897 | -2.6127 | -3.5713 | 0.7410 | 0.9587 | -601.7402 | -525.9182 | -0.9567 | -1.0880 |
0.4829 | 0.6 | 2300 | 0.4887 | -2.9530 | -4.0954 | 0.7485 | 1.1424 | -654.1511 | -559.9558 | -0.9032 | -1.0313 |
0.4752 | 0.63 | 2400 | 0.4909 | -3.1480 | -4.2815 | 0.7445 | 1.1335 | -672.7583 | -579.4506 | -0.8495 | -0.9765 |
0.5249 | 0.65 | 2500 | 0.4891 | -3.0936 | -4.2029 | 0.7445 | 1.1093 | -664.8962 | -574.0093 | -0.9136 | -1.0435 |
0.4596 | 0.68 | 2600 | 0.4939 | -2.9492 | -4.0985 | 0.7400 | 1.1493 | -654.4570 | -559.5698 | -0.9264 | -1.0549 |
0.5152 | 0.71 | 2700 | 0.4922 | -3.0197 | -4.1572 | 0.7440 | 1.1375 | -660.3236 | -566.6193 | -0.9249 | -1.0527 |
0.4518 | 0.73 | 2800 | 0.4908 | -3.0666 | -4.2342 | 0.7415 | 1.1676 | -668.0294 | -571.3138 | -0.9260 | -1.0535 |
0.5018 | 0.76 | 2900 | 0.4877 | -3.0977 | -4.2382 | 0.7465 | 1.1405 | -668.4285 | -574.4260 | -0.9320 | -1.0595 |
0.4592 | 0.79 | 3000 | 0.4873 | -2.9934 | -4.1134 | 0.7460 | 1.1200 | -655.9471 | -563.9877 | -0.9510 | -1.0788 |
0.4905 | 0.81 | 3100 | 0.4878 | -2.9825 | -4.1198 | 0.7430 | 1.1373 | -656.5853 | -562.9043 | -0.9465 | -1.0741 |
0.485 | 0.84 | 3200 | 0.4874 | -2.9459 | -4.0754 | 0.7455 | 1.1296 | -652.1517 | -559.2400 | -0.9531 | -1.0807 |
0.5157 | 0.86 | 3300 | 0.4874 | -2.9550 | -4.0838 | 0.7445 | 1.1289 | -652.9912 | -560.1489 | -0.9481 | -1.0755 |
0.4474 | 0.89 | 3400 | 0.4871 | -2.9699 | -4.1019 | 0.7435 | 1.1321 | -654.8017 | -561.6381 | -0.9499 | -1.0773 |
0.5379 | 0.92 | 3500 | 0.4874 | -2.9663 | -4.0989 | 0.7430 | 1.1326 | -654.5006 | -561.2808 | -0.9468 | -1.0742 |
0.464 | 0.94 | 3600 | 0.4874 | -2.9638 | -4.0967 | 0.7425 | 1.1329 | -654.2791 | -561.0286 | -0.9475 | -1.0748 |
0.4729 | 0.97 | 3700 | 0.4873 | -2.9666 | -4.0999 | 0.7445 | 1.1333 | -654.6014 | -561.3129 | -0.9495 | -1.0770 |
0.5017 | 0.99 | 3800 | 0.4873 | -2.9667 | -4.1000 | 0.7445 | 1.1333 | -654.6072 | -561.3217 | -0.9450 | -1.0724 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.2.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 6
Model tree for geonmin-kim/zephyr-7b-dpo-qlora
Base model
mistralai/Mistral-7B-v0.1