zephyr-7b-dpo-qlora
This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.4877
- Rewards/chosen: -2.1504
- Rewards/rejected: -3.2930
- Rewards/accuracies: 0.7485
- Rewards/margins: 1.1426
- Logps/rejected: -593.1238
- Logps/chosen: -500.2867
- Logits/rejected: -1.4918
- Logits/chosen: -1.5786
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6887 | 0.0262 | 100 | 0.6878 | 0.0320 | 0.0209 | 0.6165 | 0.0111 | -261.7386 | -282.0497 | -2.2161 | -2.2860 |
0.6673 | 0.0523 | 200 | 0.6705 | 0.0307 | -0.0223 | 0.6210 | 0.0530 | -266.0498 | -282.1794 | -2.2309 | -2.2977 |
0.622 | 0.0785 | 300 | 0.6308 | -0.4454 | -0.6416 | 0.6530 | 0.1962 | -327.9841 | -329.7844 | -2.2079 | -2.2648 |
0.6231 | 0.1047 | 400 | 0.6110 | -1.1130 | -1.4436 | 0.6600 | 0.3306 | -408.1872 | -396.5498 | -2.1373 | -2.1896 |
0.5801 | 0.1309 | 500 | 0.5821 | -1.0977 | -1.5765 | 0.6770 | 0.4787 | -421.4711 | -395.0235 | -1.9857 | -2.0520 |
0.5774 | 0.1570 | 600 | 0.5737 | -0.8337 | -1.3533 | 0.6960 | 0.5197 | -399.1586 | -368.6156 | -2.0070 | -2.0804 |
0.5622 | 0.1832 | 700 | 0.5650 | -1.6075 | -2.3332 | 0.7010 | 0.7257 | -497.1474 | -445.9985 | -1.7875 | -1.8697 |
0.519 | 0.2094 | 800 | 0.5425 | -1.1058 | -1.7696 | 0.7155 | 0.6638 | -440.7842 | -395.8254 | -1.7912 | -1.8752 |
0.4857 | 0.2355 | 900 | 0.5474 | -1.6987 | -2.4665 | 0.7225 | 0.7678 | -510.4745 | -455.1209 | -1.6205 | -1.6997 |
0.5378 | 0.2617 | 1000 | 0.5421 | -1.2297 | -2.0123 | 0.7090 | 0.7826 | -465.0541 | -408.2222 | -1.5946 | -1.6771 |
0.5569 | 0.2879 | 1100 | 0.5356 | -1.1147 | -1.7889 | 0.7175 | 0.6742 | -442.7119 | -396.7189 | -1.6536 | -1.7402 |
0.5875 | 0.3141 | 1200 | 0.5264 | -1.4433 | -2.1309 | 0.7355 | 0.6876 | -476.9160 | -429.5823 | -1.5100 | -1.6017 |
0.5681 | 0.3402 | 1300 | 0.5347 | -2.5579 | -3.4361 | 0.7165 | 0.8782 | -607.4370 | -541.0386 | -1.4877 | -1.5713 |
0.5395 | 0.3664 | 1400 | 0.5213 | -1.9355 | -2.8808 | 0.7300 | 0.9452 | -551.8996 | -478.8040 | -1.3998 | -1.4881 |
0.4408 | 0.3926 | 1500 | 0.5228 | -2.2961 | -3.4521 | 0.7355 | 1.1560 | -609.0350 | -514.8552 | -1.5441 | -1.6317 |
0.5416 | 0.4187 | 1600 | 0.5173 | -2.2653 | -3.2986 | 0.7285 | 1.0333 | -593.6861 | -511.7793 | -1.4138 | -1.5014 |
0.5261 | 0.4449 | 1700 | 0.5051 | -2.4008 | -3.4047 | 0.7385 | 1.0038 | -604.2916 | -525.3339 | -1.5638 | -1.6434 |
0.4685 | 0.4711 | 1800 | 0.5065 | -1.7470 | -2.7320 | 0.7380 | 0.9850 | -537.0220 | -459.9487 | -1.5145 | -1.6005 |
0.4293 | 0.4973 | 1900 | 0.5047 | -2.6133 | -3.7102 | 0.7390 | 1.0968 | -634.8395 | -546.5821 | -1.3755 | -1.4651 |
0.4753 | 0.5234 | 2000 | 0.5000 | -2.5931 | -3.6748 | 0.7455 | 1.0817 | -631.2996 | -544.5588 | -1.3866 | -1.4735 |
0.498 | 0.5496 | 2100 | 0.4965 | -1.8299 | -2.8777 | 0.7465 | 1.0478 | -551.5919 | -468.2369 | -1.4616 | -1.5507 |
0.506 | 0.5758 | 2200 | 0.4934 | -1.8271 | -2.7912 | 0.7455 | 0.9641 | -542.9438 | -467.9619 | -1.4831 | -1.5724 |
0.4813 | 0.6019 | 2300 | 0.4948 | -2.4682 | -3.6441 | 0.7485 | 1.1759 | -628.2384 | -532.0719 | -1.4335 | -1.5210 |
0.4851 | 0.6281 | 2400 | 0.4903 | -2.1415 | -3.2549 | 0.7450 | 1.1134 | -589.3144 | -499.4011 | -1.4529 | -1.5388 |
0.5116 | 0.6543 | 2500 | 0.4890 | -1.7892 | -2.9367 | 0.7445 | 1.1475 | -557.4963 | -464.1678 | -1.5214 | -1.6087 |
0.4451 | 0.6805 | 2600 | 0.4929 | -2.1993 | -3.4514 | 0.7505 | 1.2521 | -608.9644 | -505.1790 | -1.4632 | -1.5511 |
0.5207 | 0.7066 | 2700 | 0.4900 | -2.1993 | -3.3656 | 0.7490 | 1.1663 | -600.3847 | -505.1818 | -1.4903 | -1.5765 |
0.4458 | 0.7328 | 2800 | 0.4899 | -2.1260 | -3.2789 | 0.7475 | 1.1529 | -591.7167 | -497.8499 | -1.5008 | -1.5876 |
0.5134 | 0.7590 | 2900 | 0.4878 | -2.1729 | -3.2932 | 0.7475 | 1.1204 | -593.1492 | -502.5367 | -1.4986 | -1.5853 |
0.4722 | 0.7851 | 3000 | 0.4881 | -2.1656 | -3.2446 | 0.7505 | 1.0791 | -588.2886 | -501.8063 | -1.5024 | -1.5888 |
0.4805 | 0.8113 | 3100 | 0.4881 | -2.1831 | -3.3081 | 0.7490 | 1.1250 | -594.6381 | -503.5581 | -1.4902 | -1.5774 |
0.4891 | 0.8375 | 3200 | 0.4879 | -2.1565 | -3.2929 | 0.7490 | 1.1363 | -593.1110 | -500.9025 | -1.4972 | -1.5837 |
0.5083 | 0.8636 | 3300 | 0.4877 | -2.1423 | -3.2770 | 0.7490 | 1.1347 | -591.5213 | -499.4756 | -1.4993 | -1.5855 |
0.446 | 0.8898 | 3400 | 0.4876 | -2.1602 | -3.3022 | 0.7480 | 1.1420 | -594.0439 | -501.2723 | -1.4916 | -1.5785 |
0.5346 | 0.9160 | 3500 | 0.4877 | -2.1484 | -3.2901 | 0.7480 | 1.1418 | -592.8391 | -500.0872 | -1.4929 | -1.5797 |
0.4646 | 0.9422 | 3600 | 0.4876 | -2.1484 | -3.2908 | 0.7490 | 1.1425 | -592.9084 | -500.0869 | -1.4908 | -1.5778 |
0.4696 | 0.9683 | 3700 | 0.4876 | -2.1494 | -3.2919 | 0.7490 | 1.1426 | -593.0177 | -500.1866 | -1.4908 | -1.5778 |
0.5038 | 0.9945 | 3800 | 0.4875 | -2.1504 | -3.2931 | 0.7485 | 1.1428 | -593.1368 | -500.2856 | -1.4918 | -1.5786 |
Framework versions
- PEFT 0.7.1
- Transformers 4.40.0
- Pytorch 2.1.2+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Unable to determine this model’s pipeline type. Check the
docs
.