zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7656
  • Rewards/chosen: -3.8106
  • Rewards/rejected: -6.8888
  • Rewards/accuracies: 0.7405
  • Rewards/margins: 3.0782
  • Logps/rejected: -327.0232
  • Logps/chosen: -316.3568
  • Logits/rejected: -2.4373
  • Logits/chosen: -2.3640

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6612 0.05 100 0.6640 0.0464 -0.0349 0.6565 0.0813 -258.4842 -277.7868 -2.8352 -2.7411
0.5924 0.1 200 0.6068 0.0778 -0.2525 0.6927 0.3302 -260.6598 -277.4728 -2.8153 -2.7265
0.5488 0.15 300 0.5772 0.1688 -0.4844 0.7385 0.6531 -262.9787 -276.5630 -2.8364 -2.7548
0.5144 0.2 400 0.5635 0.0609 -0.7604 0.7347 0.8213 -265.7392 -277.6411 -2.7890 -2.7072
0.5399 0.25 500 0.5393 0.0316 -0.9906 0.75 1.0221 -268.0409 -277.9347 -2.8372 -2.7565
0.5776 0.31 600 0.5706 0.0425 -0.9799 0.7424 1.0224 -267.9345 -277.8257 -2.8388 -2.7569
0.5834 0.36 700 0.5596 0.0454 -1.0216 0.7424 1.0670 -268.3513 -277.7964 -2.7830 -2.6941
0.5394 0.41 800 0.5358 0.0804 -0.9536 0.7481 1.0341 -267.6714 -277.4460 -2.6313 -2.5480
0.5141 0.46 900 0.5412 -0.2704 -1.4309 0.7443 1.1605 -272.4444 -280.9546 -2.6662 -2.5832
0.51 0.51 1000 0.5350 -0.2070 -1.4043 0.7366 1.1973 -272.1781 -280.3206 -2.7118 -2.6217
0.5219 0.56 1100 0.5405 -0.1673 -1.3152 0.7290 1.1479 -271.2871 -279.9233 -2.7451 -2.6605
0.5391 0.61 1200 0.5320 -0.2460 -1.4452 0.7405 1.1992 -272.5871 -280.7106 -2.7552 -2.6692
0.536 0.66 1300 0.5502 -0.1919 -1.3564 0.7271 1.1645 -271.6995 -280.1697 -2.7006 -2.6126
0.6544 0.71 1400 0.5309 -0.3757 -1.6757 0.7080 1.3000 -274.8926 -282.0077 -2.6970 -2.6046
0.5697 0.76 1500 0.5662 -0.2493 -1.4791 0.7156 1.2297 -272.9258 -280.7440 -2.7656 -2.6730
0.5538 0.81 1600 0.5326 -0.4658 -1.6791 0.7214 1.2134 -274.9264 -282.9080 -2.6934 -2.5946
0.551 0.86 1700 0.5258 -0.6217 -1.8893 0.7137 1.2676 -277.0278 -284.4673 -2.6535 -2.5567
0.5708 0.92 1800 0.5639 -0.5168 -1.8962 0.7214 1.3794 -277.0974 -283.4186 -2.6279 -2.5564
0.5344 0.97 1900 0.5603 -0.3788 -1.8158 0.7271 1.4370 -276.2931 -282.0388 -2.6680 -2.5998
0.0925 1.02 2000 0.5587 -0.4628 -2.1277 0.7405 1.6648 -279.4120 -282.8788 -2.6520 -2.5825
0.112 1.07 2100 0.5731 -0.6788 -2.5908 0.7481 1.9120 -284.0433 -285.0383 -2.5722 -2.5094
0.0539 1.12 2200 0.5869 -1.0820 -2.9310 0.7366 1.8489 -287.4448 -289.0707 -2.5937 -2.5303
0.0811 1.17 2300 0.6306 -0.8332 -2.7204 0.7424 1.8872 -285.3392 -286.5822 -2.5137 -2.4560
0.0877 1.22 2400 0.5963 -1.3075 -3.3622 0.7481 2.0548 -291.7576 -291.3254 -2.5925 -2.5291
0.1114 1.27 2500 0.6126 -1.3609 -3.5524 0.7462 2.1915 -293.6587 -291.8594 -2.4792 -2.4142
0.0864 1.32 2600 0.6457 -1.6093 -3.7584 0.75 2.1491 -295.7195 -294.3440 -2.5710 -2.5058
0.0708 1.37 2700 0.6080 -1.8094 -3.7042 0.7462 1.8948 -295.1769 -296.3445 -2.5394 -2.4684
0.0794 1.42 2800 0.6010 -1.7685 -3.8603 0.7538 2.0918 -296.7380 -295.9354 -2.5369 -2.4663
0.1009 1.48 2900 0.6102 -1.6050 -3.5962 0.7347 1.9912 -294.0973 -294.3007 -2.4834 -2.4073
0.083 1.53 3000 0.6125 -1.6395 -3.6683 0.7424 2.0288 -294.8184 -294.6455 -2.5306 -2.4521
0.0871 1.58 3100 0.6392 -1.7447 -3.8250 0.75 2.0802 -296.3850 -295.6979 -2.5032 -2.4279
0.1168 1.63 3200 0.5973 -1.6226 -3.5602 0.7443 1.9376 -293.7374 -294.4764 -2.5372 -2.4606
0.0699 1.68 3300 0.5816 -1.6383 -3.5364 0.7424 1.8982 -293.4994 -294.6331 -2.5287 -2.4527
0.1082 1.73 3400 0.5895 -1.8055 -3.7976 0.7424 1.9920 -296.1109 -296.3059 -2.5178 -2.4442
0.09 1.78 3500 0.6231 -1.8455 -4.0234 0.75 2.1779 -298.3694 -296.7055 -2.5261 -2.4561
0.1238 1.83 3600 0.6047 -1.6771 -3.5997 0.7424 1.9226 -294.1321 -295.0213 -2.6294 -2.5512
0.0847 1.88 3700 0.5898 -1.6725 -3.5743 0.7347 1.9018 -293.8779 -294.9758 -2.6224 -2.5471
0.0908 1.93 3800 0.5817 -1.6076 -3.5381 0.7366 1.9304 -293.5158 -294.3269 -2.5778 -2.5047
0.0666 1.98 3900 0.6063 -1.6950 -3.7437 0.7309 2.0487 -295.5718 -295.2004 -2.5784 -2.5061
0.0173 2.03 4000 0.6213 -2.1227 -4.3451 0.7309 2.2224 -301.5862 -299.4778 -2.6197 -2.5495
0.0213 2.09 4100 0.6529 -2.4461 -4.9221 0.7366 2.4759 -307.3557 -302.7117 -2.6029 -2.5335
0.0149 2.14 4200 0.6934 -3.0653 -5.7847 0.7347 2.7194 -315.9821 -308.9039 -2.5938 -2.5272
0.0084 2.19 4300 0.7083 -3.1845 -6.0188 0.7405 2.8343 -318.3230 -310.0955 -2.5088 -2.4404
0.0059 2.24 4400 0.7193 -3.3983 -6.2807 0.7405 2.8824 -320.9418 -312.2334 -2.5109 -2.4479
0.0116 2.29 4500 0.7128 -3.3425 -6.1944 0.7462 2.8519 -320.0795 -311.6758 -2.4787 -2.4132
0.0077 2.34 4600 0.7219 -3.2306 -6.1475 0.7481 2.9169 -319.6102 -310.5562 -2.4449 -2.3779
0.0177 2.39 4700 0.7451 -3.5469 -6.5210 0.75 2.9742 -323.3456 -313.7194 -2.3861 -2.3174
0.0112 2.44 4800 0.7547 -3.4801 -6.4397 0.7424 2.9595 -322.5316 -313.0519 -2.3939 -2.3242
0.0071 2.49 4900 0.7691 -3.8596 -6.8490 0.7443 2.9895 -326.6253 -316.8460 -2.3524 -2.2834
0.0118 2.54 5000 0.7717 -3.8862 -6.8731 0.7462 2.9868 -326.8659 -317.1129 -2.3347 -2.2658
0.014 2.59 5100 0.7685 -3.5970 -6.5998 0.7481 3.0028 -324.1335 -314.2205 -2.3512 -2.2783
0.0208 2.64 5200 0.7741 -3.9029 -6.8895 0.7443 2.9866 -327.0299 -317.2794 -2.3875 -2.3143
0.0076 2.7 5300 0.7600 -3.6159 -6.5800 0.7424 2.9641 -323.9353 -314.4092 -2.4331 -2.3592
0.0146 2.75 5400 0.7768 -3.7657 -6.8555 0.7424 3.0898 -326.6905 -315.9074 -2.4475 -2.3751
0.0161 2.8 5500 0.7902 -3.9170 -7.0635 0.7481 3.1465 -328.7701 -317.4208 -2.4332 -2.3620
0.0056 2.85 5600 0.7827 -3.9513 -7.0687 0.7424 3.1174 -328.8217 -317.7632 -2.4313 -2.3599
0.0083 2.9 5700 0.7741 -3.8805 -6.9708 0.7443 3.0903 -327.8432 -317.0560 -2.4324 -2.3598
0.0243 2.95 5800 0.7657 -3.8176 -6.8913 0.7405 3.0737 -327.0486 -316.4268 -2.4355 -2.3620

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
15
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for yimingzhang/zephyr-7b-dpo-full

Finetuned
(314)
this model