Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4877
  • Rewards/chosen: -2.1504
  • Rewards/rejected: -3.2930
  • Rewards/accuracies: 0.7485
  • Rewards/margins: 1.1426
  • Logps/rejected: -593.1238
  • Logps/chosen: -500.2867
  • Logits/rejected: -1.4918
  • Logits/chosen: -1.5786

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6887 0.0262 100 0.6878 0.0320 0.0209 0.6165 0.0111 -261.7386 -282.0497 -2.2161 -2.2860
0.6673 0.0523 200 0.6705 0.0307 -0.0223 0.6210 0.0530 -266.0498 -282.1794 -2.2309 -2.2977
0.622 0.0785 300 0.6308 -0.4454 -0.6416 0.6530 0.1962 -327.9841 -329.7844 -2.2079 -2.2648
0.6231 0.1047 400 0.6110 -1.1130 -1.4436 0.6600 0.3306 -408.1872 -396.5498 -2.1373 -2.1896
0.5801 0.1309 500 0.5821 -1.0977 -1.5765 0.6770 0.4787 -421.4711 -395.0235 -1.9857 -2.0520
0.5774 0.1570 600 0.5737 -0.8337 -1.3533 0.6960 0.5197 -399.1586 -368.6156 -2.0070 -2.0804
0.5622 0.1832 700 0.5650 -1.6075 -2.3332 0.7010 0.7257 -497.1474 -445.9985 -1.7875 -1.8697
0.519 0.2094 800 0.5425 -1.1058 -1.7696 0.7155 0.6638 -440.7842 -395.8254 -1.7912 -1.8752
0.4857 0.2355 900 0.5474 -1.6987 -2.4665 0.7225 0.7678 -510.4745 -455.1209 -1.6205 -1.6997
0.5378 0.2617 1000 0.5421 -1.2297 -2.0123 0.7090 0.7826 -465.0541 -408.2222 -1.5946 -1.6771
0.5569 0.2879 1100 0.5356 -1.1147 -1.7889 0.7175 0.6742 -442.7119 -396.7189 -1.6536 -1.7402
0.5875 0.3141 1200 0.5264 -1.4433 -2.1309 0.7355 0.6876 -476.9160 -429.5823 -1.5100 -1.6017
0.5681 0.3402 1300 0.5347 -2.5579 -3.4361 0.7165 0.8782 -607.4370 -541.0386 -1.4877 -1.5713
0.5395 0.3664 1400 0.5213 -1.9355 -2.8808 0.7300 0.9452 -551.8996 -478.8040 -1.3998 -1.4881
0.4408 0.3926 1500 0.5228 -2.2961 -3.4521 0.7355 1.1560 -609.0350 -514.8552 -1.5441 -1.6317
0.5416 0.4187 1600 0.5173 -2.2653 -3.2986 0.7285 1.0333 -593.6861 -511.7793 -1.4138 -1.5014
0.5261 0.4449 1700 0.5051 -2.4008 -3.4047 0.7385 1.0038 -604.2916 -525.3339 -1.5638 -1.6434
0.4685 0.4711 1800 0.5065 -1.7470 -2.7320 0.7380 0.9850 -537.0220 -459.9487 -1.5145 -1.6005
0.4293 0.4973 1900 0.5047 -2.6133 -3.7102 0.7390 1.0968 -634.8395 -546.5821 -1.3755 -1.4651
0.4753 0.5234 2000 0.5000 -2.5931 -3.6748 0.7455 1.0817 -631.2996 -544.5588 -1.3866 -1.4735
0.498 0.5496 2100 0.4965 -1.8299 -2.8777 0.7465 1.0478 -551.5919 -468.2369 -1.4616 -1.5507
0.506 0.5758 2200 0.4934 -1.8271 -2.7912 0.7455 0.9641 -542.9438 -467.9619 -1.4831 -1.5724
0.4813 0.6019 2300 0.4948 -2.4682 -3.6441 0.7485 1.1759 -628.2384 -532.0719 -1.4335 -1.5210
0.4851 0.6281 2400 0.4903 -2.1415 -3.2549 0.7450 1.1134 -589.3144 -499.4011 -1.4529 -1.5388
0.5116 0.6543 2500 0.4890 -1.7892 -2.9367 0.7445 1.1475 -557.4963 -464.1678 -1.5214 -1.6087
0.4451 0.6805 2600 0.4929 -2.1993 -3.4514 0.7505 1.2521 -608.9644 -505.1790 -1.4632 -1.5511
0.5207 0.7066 2700 0.4900 -2.1993 -3.3656 0.7490 1.1663 -600.3847 -505.1818 -1.4903 -1.5765
0.4458 0.7328 2800 0.4899 -2.1260 -3.2789 0.7475 1.1529 -591.7167 -497.8499 -1.5008 -1.5876
0.5134 0.7590 2900 0.4878 -2.1729 -3.2932 0.7475 1.1204 -593.1492 -502.5367 -1.4986 -1.5853
0.4722 0.7851 3000 0.4881 -2.1656 -3.2446 0.7505 1.0791 -588.2886 -501.8063 -1.5024 -1.5888
0.4805 0.8113 3100 0.4881 -2.1831 -3.3081 0.7490 1.1250 -594.6381 -503.5581 -1.4902 -1.5774
0.4891 0.8375 3200 0.4879 -2.1565 -3.2929 0.7490 1.1363 -593.1110 -500.9025 -1.4972 -1.5837
0.5083 0.8636 3300 0.4877 -2.1423 -3.2770 0.7490 1.1347 -591.5213 -499.4756 -1.4993 -1.5855
0.446 0.8898 3400 0.4876 -2.1602 -3.3022 0.7480 1.1420 -594.0439 -501.2723 -1.4916 -1.5785
0.5346 0.9160 3500 0.4877 -2.1484 -3.2901 0.7480 1.1418 -592.8391 -500.0872 -1.4929 -1.5797
0.4646 0.9422 3600 0.4876 -2.1484 -3.2908 0.7490 1.1425 -592.9084 -500.0869 -1.4908 -1.5778
0.4696 0.9683 3700 0.4876 -2.1494 -3.2919 0.7490 1.1426 -593.0177 -500.1866 -1.4908 -1.5778
0.5038 0.9945 3800 0.4875 -2.1504 -3.2931 0.7485 1.1428 -593.1368 -500.2856 -1.4918 -1.5786

Framework versions

  • PEFT 0.7.1
  • Transformers 4.40.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train SF-Foundation/zephyr-7b-dpo-qlora