zephyr-7b-dpo-qlora / README.md
objects76's picture
Model save
dd9949b verified
|
raw
history blame
No virus
9.22 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
model-index:
  - name: zephyr-7b-dpo-qlora
    results: []

zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4894
  • Rewards/chosen: -2.7838
  • Rewards/rejected: -3.9130
  • Rewards/accuracies: 0.7445
  • Rewards/margins: 1.1292
  • Logps/rejected: -635.9062
  • Logps/chosen: -543.0328
  • Logits/rejected: -1.2074
  • Logits/chosen: -1.3340

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6823 0.03 100 0.6822 0.0498 0.0268 0.6610 0.0230 -241.9323 -259.6750 -1.9559 -2.0953
0.6492 0.05 200 0.6491 -0.0479 -0.1535 0.6815 0.1056 -259.9606 -269.4400 -1.9338 -2.0706
0.6101 0.08 300 0.6217 -0.3407 -0.5476 0.6770 0.2069 -299.3728 -298.7252 -1.8680 -2.0021
0.6173 0.1 400 0.5952 -0.5027 -0.8331 0.6835 0.3304 -327.9222 -314.9250 -1.6582 -1.7878
0.5435 0.13 500 0.5754 -1.1151 -1.6071 0.6890 0.4920 -405.3195 -376.1609 -1.4273 -1.5544
0.5547 0.16 600 0.5695 -0.7600 -1.2661 0.6985 0.5061 -371.2198 -340.6527 -1.4396 -1.5726
0.5282 0.18 700 0.5560 -2.0627 -2.9172 0.7165 0.8545 -536.3329 -470.9231 -1.2515 -1.3804
0.5205 0.21 800 0.5364 -1.6968 -2.4004 0.7265 0.7036 -484.6470 -434.3307 -1.2756 -1.4041
0.4983 0.24 900 0.5329 -1.6798 -2.4538 0.7205 0.7740 -489.9910 -432.6339 -1.0956 -1.2161
0.5443 0.26 1000 0.5279 -1.8981 -2.7666 0.7240 0.8684 -521.2657 -454.4658 -1.1264 -1.2533
0.565 0.29 1100 0.5207 -1.5130 -2.3368 0.7290 0.8238 -478.2849 -415.9483 -1.1445 -1.2715
0.5837 0.31 1200 0.5104 -1.6729 -2.5375 0.7355 0.8645 -498.3547 -431.9437 -1.1065 -1.2314
0.5342 0.34 1300 0.5146 -2.7684 -3.8446 0.7240 1.0762 -629.0701 -541.4911 -1.0656 -1.1852
0.5287 0.37 1400 0.5197 -1.9068 -2.8614 0.7235 0.9546 -530.7440 -455.3286 -1.1253 -1.2506
0.4634 0.39 1500 0.5165 -2.1400 -3.2391 0.7295 1.0991 -568.5231 -478.6544 -1.1408 -1.2696
0.5551 0.42 1600 0.5057 -2.4748 -3.5466 0.7310 1.0718 -599.2672 -512.1343 -1.1162 -1.2402
0.5183 0.44 1700 0.4993 -2.7856 -3.8497 0.7390 1.0641 -629.5833 -543.2154 -1.1493 -1.2784
0.478 0.47 1800 0.5060 -2.6855 -3.7424 0.7390 1.0569 -618.8510 -533.2012 -1.1180 -1.2419
0.4325 0.5 1900 0.4996 -3.0306 -4.2124 0.7370 1.1818 -665.8478 -567.7128 -1.1245 -1.2515
0.4926 0.52 2000 0.4934 -2.6648 -3.6771 0.7405 1.0123 -612.3228 -531.1354 -1.1607 -1.2879
0.5009 0.55 2100 0.4915 -2.8243 -3.8594 0.7510 1.0351 -630.5530 -547.0867 -1.1825 -1.3099
0.4777 0.58 2200 0.4914 -2.3357 -3.3121 0.7475 0.9764 -575.8183 -498.2264 -1.2484 -1.3780
0.4655 0.6 2300 0.4928 -3.0709 -4.2756 0.7450 1.2047 -672.1651 -571.7407 -1.1628 -1.2897
0.47 0.63 2400 0.4909 -2.9333 -4.0701 0.7410 1.1368 -651.6222 -557.9854 -1.1517 -1.2773
0.4963 0.65 2500 0.4933 -2.6058 -3.7730 0.7390 1.1672 -621.9061 -525.2288 -1.1945 -1.3239
0.4663 0.68 2600 0.4950 -2.6796 -3.8395 0.7450 1.1599 -628.5566 -532.6130 -1.1991 -1.3264
0.5286 0.71 2700 0.4961 -2.6413 -3.7802 0.7380 1.1389 -622.6273 -528.7829 -1.2033 -1.3309
0.4564 0.73 2800 0.4925 -2.6808 -3.8257 0.7405 1.1448 -627.1752 -532.7354 -1.2038 -1.3305
0.5166 0.76 2900 0.4904 -2.7803 -3.8999 0.7415 1.1197 -634.5994 -542.6777 -1.2046 -1.3310
0.4653 0.79 3000 0.4896 -2.7971 -3.8847 0.7425 1.0877 -633.0811 -544.3574 -1.2067 -1.3333
0.4808 0.81 3100 0.4901 -2.8200 -3.9473 0.7410 1.1273 -639.3414 -546.6562 -1.2009 -1.3278
0.4882 0.84 3200 0.4896 -2.7656 -3.8890 0.7440 1.1234 -633.5068 -541.2137 -1.2088 -1.3355
0.5123 0.86 3300 0.4895 -2.7745 -3.8976 0.7435 1.1231 -634.3662 -542.1025 -1.2083 -1.3352
0.4526 0.89 3400 0.4896 -2.7856 -3.9136 0.7445 1.1280 -635.9655 -543.2083 -1.2051 -1.3319
0.5432 0.92 3500 0.4896 -2.7837 -3.9130 0.7440 1.1292 -635.9039 -543.0231 -1.2045 -1.3314
0.4617 0.94 3600 0.4895 -2.7857 -3.9150 0.7435 1.1294 -636.1135 -543.2186 -1.2104 -1.3374
0.4797 0.97 3700 0.4896 -2.7842 -3.9131 0.7435 1.1289 -635.9192 -543.0764 -1.2075 -1.3343
0.5092 0.99 3800 0.4894 -2.7838 -3.9130 0.7445 1.1292 -635.9062 -543.0328 -1.2074 -1.3340

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2