zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5299
  • Rewards/chosen: -3.0720
  • Rewards/rejected: -4.6492
  • Rewards/accuracies: 0.7275
  • Rewards/margins: 1.5772
  • Logps/rejected: -728.1719
  • Logps/chosen: -592.3389
  • Logits/rejected: -1.2212
  • Logits/chosen: -1.3455

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6917 0.0262 400 -3.0250 -2.9734 -284.8504 -263.2442 0.6917 0.6350 0.0029 0.0028 0.0001
0.6787 0.0523 800 0.6800 0.0242 -0.0033 0.6860 0.0276 -263.5826 -282.7138 -2.9532 -3.0046
0.6348 0.0785 1200 0.6376 -0.0096 -0.1486 0.6755 0.1390 -278.1083 -286.0981 -2.8858 -2.9319
0.629 0.1047 1600 0.6087 -0.3993 -0.6875 0.6785 0.2882 -331.9969 -325.0683 -2.7749 -2.8148
0.5602 0.1309 2000 0.5979 -0.5708 -0.9723 0.6855 0.4014 -360.4759 -342.2224 -2.7488 -2.7916
0.5783 0.1570 2400 0.5952 -0.7444 -1.2632 0.6910 0.5188 -389.5722 -359.5799 -2.6852 -2.7273
0.6364 0.1832 2800 0.6014 -2.0557 -2.8123 0.6970 0.7566 -544.4844 -490.7089 -2.0799 -2.1273
0.6807 0.2094 3200 0.5654 -2.1440 -3.0639 0.7030 0.9199 -569.6410 -499.5395 -1.6977 -1.7604
0.6616 0.2355 3600 0.5712 -2.9371 -3.9619 0.7165 1.0247 -659.4373 -578.8513 -1.2775 -1.3472
0.4475 0.2617 4000 0.5522 -2.1606 -3.0883 0.7250 0.9277 -572.0762 -501.1973 -1.6222 -1.6801
0.5934 0.2879 4400 0.5452 -2.0993 -3.0686 0.7150 0.9693 -570.1054 -495.0656 -1.5863 -1.6559
0.5422 0.3141 4800 0.5520 -2.7041 -3.8442 0.7220 1.1401 -647.6720 -555.5510 -1.5167 -1.5930
0.6307 0.3402 5200 0.5378 -2.2755 -3.3838 0.7285 1.1083 -601.6280 -512.6918 -1.6752 -1.7599
0.7039 0.3664 5600 0.5306 -1.7946 -2.8494 0.7250 1.0548 -548.1910 -464.5987 -1.6121 -1.6982
0.6561 0.3926 6000 0.5516 -2.6777 -4.0196 0.7205 1.3418 -665.2089 -552.9131 -1.6257 -1.7129
0.5698 0.4188 6400 0.5181 -2.1847 -3.1985 0.7365 1.0138 -583.0958 -503.6094 -1.6584 -1.7391
0.5919 0.4449 6800 0.5219 -1.9491 -3.1280 0.7195 1.1790 -576.0514 -480.0444 -1.6888 -1.7826
0.6161 0.4711 7200 0.5417 -2.7779 -4.2107 0.7335 1.4328 -684.3200 -562.9326 -1.4277 -1.5325
0.4585 0.4973 7600 0.5326 -2.4424 -3.8173 0.7355 1.3748 -644.9775 -529.3820 -1.5104 -1.6091
0.7168 0.5234 8000 0.5298 -2.7451 -4.1021 0.7390 1.3569 -673.4548 -559.6511 -1.3613 -1.4625
0.7179 0.5496 8400 0.5450 -3.1455 -4.6991 0.7330 1.5536 -733.1592 -599.6882 -1.2796 -1.3950
0.4405 0.5758 8800 0.5088 -1.9634 -3.1323 0.7425 1.1689 -576.4830 -481.4787 -1.5418 -1.6311
0.4464 0.6020 9200 0.5306 -2.5354 -3.9140 0.7325 1.3786 -654.6471 -538.6789 -1.3558 -1.4605
0.43 0.6281 9600 0.5292 -2.7495 -4.1617 0.7335 1.4122 -679.4191 -560.0843 -1.2192 -1.3258
0.48 0.6543 10000 0.5317 -2.5185 -3.9464 0.7245 1.4279 -657.8862 -536.9896 -1.3340 -1.4473
0.7352 0.6805 10400 0.5257 -2.7204 -4.1745 0.7315 1.4541 -680.6992 -557.1738 -1.3220 -1.4356
0.6986 0.7066 10800 0.5242 -2.8515 -4.3094 0.7300 1.4580 -694.1929 -570.2861 -1.2609 -1.3721
0.4944 0.7328 11200 0.5282 -2.8438 -4.3275 0.7320 1.4837 -695.9977 -569.5184 -1.2780 -1.3930
0.3577 0.7590 11600 0.5159 -2.7874 -4.1731 0.7345 1.3857 -680.5639 -563.8783 -1.3489 -1.4592
0.602 0.7852 12000 0.5213 -2.9605 -4.3944 0.7315 1.4339 -702.6897 -581.1863 -1.2926 -1.4077
0.4698 0.8113 12400 0.5320 -3.2528 -4.8286 0.7300 1.5759 -746.1134 -610.4158 -1.1834 -1.3076
0.4796 0.8375 12800 0.5180 -2.7532 -4.1875 0.7325 1.4343 -681.9944 -560.4576 -1.2848 -1.3996
0.4354 0.8637 13200 0.5226 -2.8473 -4.3400 0.7335 1.4927 -697.2530 -569.8687 -1.2477 -1.3671
0.4068 0.8898 13600 0.5262 -3.0065 -4.5462 0.7310 1.5397 -717.8715 -585.7884 -1.2316 -1.3538
0.5134 0.9160 14000 0.5281 -2.9950 -4.5567 0.7300 1.5617 -718.9149 -584.6379 -1.2311 -1.3549
0.7272 0.9422 14400 0.5305 -3.0852 -4.6701 0.7275 1.5849 -730.2634 -593.6614 -1.2166 -1.3417
0.3916 0.9684 14800 0.5299 -3.0770 -4.6548 0.7265 1.5778 -728.7334 -592.8383 -1.2201 -1.3446
0.4814 0.9945 15200 0.5296 -3.0725 -4.6501 0.7280 1.5776 -728.2595 -592.3885 -1.2210 -1.3453

Framework versions

  • PEFT 0.7.1
  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.0
Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for daijiao/zephyr-7b-dpo-qlora

Adapter
(137)
this model

Dataset used to train daijiao/zephyr-7b-dpo-qlora