Edit model card

zephyr-7b-dpo-lora-pubmedqa-mix2

This model is a fine-tuned version of EllieS/zephyr-7b-sft-qlora on the EllieS/pubmedqa_dpo_mix_data dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0013
  • Rewards/chosen: -1.8126
  • Rewards/rejected: -10.9731
  • Rewards/accuracies: 1.0
  • Rewards/margins: 9.1605
  • Logps/rejected: -1144.0397
  • Logps/chosen: -242.4412
  • Logits/rejected: -1.7638
  • Logits/chosen: -2.8841

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • total_eval_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.2697 0.04 3000 0.3396 0.2213 -0.6386 1.0 0.8599 -110.5876 -39.0518 -3.0278 -3.0862
0.1599 0.07 6000 0.0750 -0.5884 -3.6673 1.0 3.0789 -413.4546 -120.0204 -2.9055 -3.0346
0.0563 0.11 9000 0.0204 -0.6260 -5.6712 1.0 5.0452 -613.8441 -123.7819 -3.0269 -3.1136
0.0463 0.14 12000 0.0287 -0.7209 -7.9224 1.0 7.2014 -838.9609 -133.2740 -3.0642 -3.1628
0.1206 0.18 15000 0.0030 -0.9209 -8.8089 1.0 7.8880 -927.6118 -153.2670 -3.0802 -3.1766
0.0508 0.22 18000 0.4964 -0.4026 -8.0330 1.0 7.6304 -850.0245 -101.4397 -3.1314 -3.2075
0.0323 0.25 21000 0.0872 -1.4713 -10.3437 1.0 8.8723 -1081.0913 -208.3129 -2.6496 -3.1189
0.4534 0.29 24000 0.0077 -2.3507 -12.1827 1.0 9.8320 -1264.9957 -296.2491 -1.6282 -2.8665
0.0013 0.32 27000 0.0019 -2.1480 -10.6645 1.0 8.5166 -1113.1797 -275.9768 -1.7614 -2.8604
0.1404 0.36 30000 0.0002 -2.4964 -12.4101 1.0 9.9138 -1287.7384 -310.8155 -1.5907 -2.8352
0.0198 0.4 33000 0.0009 -3.0802 -13.3347 1.0 10.2545 -1380.1964 -369.1991 -1.6628 -2.8372
0.0041 0.43 36000 0.0004 -2.7800 -12.5815 1.0 9.8014 -1304.8732 -339.1852 -1.6282 -2.8242
0.0007 0.47 39000 0.0007 -2.9921 -13.2089 1.0 10.2168 -1367.6129 -360.3922 -1.6672 -2.8403
0.0008 0.5 42000 0.0013 -2.3107 -11.8754 1.0 9.5647 -1234.2609 -292.2454 -1.6475 -2.8400
0.0024 0.54 45000 0.0010 -3.3769 -13.2333 1.0 9.8564 -1370.0538 -398.8731 -1.6937 -2.8403
0.0019 0.57 48000 0.0013 -2.8151 -12.4427 1.0 9.6277 -1290.9999 -342.6892 -1.7047 -2.8503
0.2266 0.61 51000 0.0014 -1.9532 -11.0212 1.0 9.0680 -1148.8468 -256.4992 -1.6745 -2.8650
0.0016 0.65 54000 0.0014 -1.8077 -10.7512 1.0 8.9435 -1121.8423 -241.9466 -1.8328 -2.8946
0.0019 0.68 57000 0.0013 -1.8159 -10.8808 1.0 9.0649 -1134.8024 -242.7715 -1.7644 -2.8860
0.0013 0.72 60000 0.0013 -1.7356 -10.8007 1.0 9.0651 -1126.8002 -234.7419 -1.7574 -2.8871
0.0014 0.75 63000 0.0013 -1.8249 -10.9773 1.0 9.1524 -1144.4586 -243.6743 -1.7699 -2.8867
0.0014 0.79 66000 0.0013 -1.8308 -10.9698 1.0 9.1389 -1143.7017 -244.2651 -1.7597 -2.8841
0.0011 0.83 69000 0.0013 -1.8034 -10.9390 1.0 9.1356 -1140.6276 -241.5220 -1.7619 -2.8858
0.0016 0.86 72000 0.0013 -1.7971 -10.9097 1.0 9.1126 -1137.6914 -240.8868 -1.7608 -2.8852
0.0239 0.9 75000 0.0013 -1.7976 -10.9400 1.0 9.1424 -1140.7238 -240.9355 -1.7773 -2.8872
0.0024 0.93 78000 0.0013 -1.7862 -10.9196 1.0 9.1334 -1138.6901 -239.8036 -1.7733 -2.8861
0.0018 0.97 81000 0.0013 -1.8228 -10.9802 1.0 9.1574 -1144.7491 -243.4639 -1.7594 -2.8860

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
1
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train EllieS/zephyr-7b-dpo-lora-pubmedqa-mix2