Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4788
  • Rewards/chosen: -2.6215
  • Rewards/rejected: -3.9187
  • Rewards/accuracies: 0.7465
  • Rewards/margins: 1.2972
  • Logps/rejected: -636.4379
  • Logps/chosen: -526.7527
  • Logits/rejected: -1.0290
  • Logits/chosen: -1.1652

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6807 0.0262 100 0.6809 0.0514 0.0256 0.6555 0.0258 -242.0131 -259.4604 -2.0551 -2.1482
0.6438 0.0523 200 0.6356 -0.1881 -0.3389 0.6760 0.1508 -278.4615 -283.4154 -2.0113 -2.1000
0.6073 0.0785 300 0.6054 -0.6866 -0.9744 0.6815 0.2878 -342.0091 -333.2583 -1.9949 -2.0782
0.5956 0.1047 400 0.5824 -1.4485 -1.9599 0.6830 0.5114 -440.5653 -409.4522 -1.5844 -1.6758
0.5643 0.1309 500 0.5726 -1.1458 -1.7589 0.6915 0.6131 -420.4636 -379.1804 -1.5624 -1.6658
0.5373 0.1570 600 0.5631 -1.1286 -1.8164 0.7030 0.6878 -426.2121 -377.4605 -1.6945 -1.7955
0.5394 0.1832 700 0.5474 -2.2700 -3.0663 0.7040 0.7963 -551.1992 -491.6012 -1.1628 -1.2719
0.4983 0.2094 800 0.5323 -1.5616 -2.2966 0.7225 0.7349 -474.2269 -420.7654 -1.5104 -1.5996
0.4763 0.2355 900 0.5386 -1.6130 -2.4122 0.7160 0.7992 -485.7890 -425.9030 -1.4156 -1.4989
0.5266 0.2617 1000 0.5234 -2.1788 -3.0546 0.7280 0.8758 -550.0311 -482.4831 -1.2043 -1.3050
0.59 0.2879 1100 0.5278 -1.6937 -2.3427 0.7300 0.6490 -478.8385 -433.9710 -0.9899 -1.1100
0.5724 0.3141 1200 0.5071 -1.5548 -2.4072 0.7380 0.8523 -485.2895 -420.0863 -1.1349 -1.2473
0.5457 0.3402 1300 0.5013 -1.7544 -2.6264 0.7435 0.8721 -507.2138 -440.0385 -1.2424 -1.3403
0.5423 0.3664 1400 0.5132 -1.6381 -2.6114 0.7210 0.9733 -505.7077 -428.4097 -1.5063 -1.5869
0.4492 0.3926 1500 0.5122 -1.5882 -2.5891 0.7260 1.0010 -503.4828 -423.4175 -1.4972 -1.5950
0.5491 0.4187 1600 0.4956 -1.6959 -2.7056 0.7395 1.0098 -515.1351 -434.1913 -1.1293 -1.2525
0.5408 0.4449 1700 0.5111 -3.0361 -4.2392 0.7305 1.2030 -668.4869 -568.2142 -1.0520 -1.1774
0.4705 0.4711 1800 0.4949 -2.1236 -3.1894 0.7435 1.0658 -563.5121 -476.9663 -1.3479 -1.4508
0.4447 0.4973 1900 0.4984 -2.0350 -3.1505 0.7420 1.1155 -559.6229 -468.1011 -1.1711 -1.2951
0.4561 0.5234 2000 0.4929 -1.9668 -2.9588 0.7420 0.9919 -540.4462 -461.2839 -1.3557 -1.4696
0.5068 0.5496 2100 0.4969 -3.1452 -4.3633 0.7350 1.2180 -680.8954 -579.1231 -1.1150 -1.2426
0.4839 0.5758 2200 0.4927 -2.3797 -3.4376 0.7405 1.0579 -588.3315 -502.5681 -1.2706 -1.3886
0.4729 0.6019 2300 0.4924 -2.8461 -4.1210 0.7405 1.2749 -656.6667 -549.2124 -1.0868 -1.2145
0.4501 0.6281 2400 0.4900 -2.9743 -4.2366 0.7430 1.2623 -668.2346 -562.0333 -0.9978 -1.1257
0.4982 0.6543 2500 0.4872 -2.4585 -3.6758 0.7420 1.2173 -612.1486 -510.4511 -1.0532 -1.1862
0.4649 0.6805 2600 0.4881 -2.5759 -3.8831 0.7450 1.3072 -632.8793 -522.1908 -1.0793 -1.2115
0.556 0.7066 2700 0.4841 -2.3432 -3.5113 0.7460 1.1680 -595.6959 -498.9265 -1.1004 -1.2295
0.4617 0.7328 2800 0.4832 -2.3495 -3.6183 0.7460 1.2689 -606.4033 -499.5496 -1.0627 -1.1960
0.4916 0.7590 2900 0.4800 -2.6711 -3.9165 0.7455 1.2454 -636.2195 -531.7142 -1.0032 -1.1418
0.4708 0.7851 3000 0.4797 -2.6166 -3.7883 0.7475 1.1717 -623.4008 -526.2621 -0.9962 -1.1355
0.4804 0.8113 3100 0.4807 -2.8224 -4.1220 0.7475 1.2996 -656.7728 -546.8435 -0.9953 -1.1341
0.4866 0.8375 3200 0.4777 -2.5496 -3.7894 0.7475 1.2398 -623.5103 -519.5614 -1.0276 -1.1641
0.4967 0.8636 3300 0.4786 -2.5578 -3.8108 0.7480 1.2530 -625.6535 -520.3804 -1.0241 -1.1608
0.4272 0.8898 3400 0.4797 -2.7223 -4.0287 0.7460 1.3065 -647.4435 -536.8282 -1.0071 -1.1445
0.5272 0.9160 3500 0.4797 -2.7144 -4.0320 0.7470 1.3176 -647.7730 -536.0449 -1.0233 -1.1601
0.4441 0.9422 3600 0.4790 -2.6459 -3.9513 0.7470 1.3054 -639.7043 -529.1944 -1.0278 -1.1641
0.4823 0.9683 3700 0.4789 -2.6279 -3.9262 0.7480 1.2982 -637.1880 -527.3952 -1.0329 -1.1687
0.4996 0.9945 3800 0.4788 -2.6215 -3.9183 0.7475 1.2968 -636.4029 -526.7561 -1.0296 -1.1658

Framework versions

  • PEFT 0.13.2
  • Transformers 4.45.2
  • Pytorch 2.1.2+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for guoqiang-x/zephyr-7b-dpo-qlora

Adapter
(1172)
this model

Dataset used to train guoqiang-x/zephyr-7b-dpo-qlora