zephyr-7b-dpo-full / README.md
dongwang218's picture
Model save
36c9807
metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6874
  • Rewards/chosen: -4.3150
  • Rewards/rejected: -8.0704
  • Rewards/accuracies: 0.7857
  • Rewards/margins: 3.7554
  • Logps/rejected: -325.6119
  • Logps/chosen: -339.6828
  • Logits/rejected: -2.6781
  • Logits/chosen: -2.7397

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5397 0.1 100 0.5211 0.1287 -0.6851 0.7579 0.8138 -251.7586 -295.2458 -2.9742 -3.0033
0.4919 0.21 200 0.4873 0.0278 -1.1599 0.7897 1.1876 -256.5061 -296.2552 -3.0688 -3.0898
0.4802 0.31 300 0.5027 -0.2234 -1.3257 0.7540 1.1023 -258.1646 -298.7669 -3.0494 -3.0828
0.5134 0.41 400 0.5098 -0.2878 -1.6709 0.7698 1.3832 -261.6169 -299.4102 -2.8843 -2.9179
0.4534 0.52 500 0.4905 -0.1808 -1.6336 0.7698 1.4528 -261.2433 -298.3406 -2.9804 -3.0182
0.4976 0.62 600 0.4872 -0.2273 -1.5386 0.7659 1.3112 -260.2931 -298.8059 -2.9266 -2.9730
0.5452 0.72 700 0.4888 -0.4813 -1.6851 0.7341 1.2039 -261.7586 -301.3452 -2.9377 -2.9686
0.5342 0.83 800 0.4774 -0.3705 -1.9222 0.7857 1.5517 -264.1292 -300.2377 -2.8434 -2.8821
0.5014 0.93 900 0.4814 -0.2397 -1.6794 0.7619 1.4397 -261.7013 -298.9296 -2.8339 -2.8781
0.0785 1.03 1000 0.4821 -0.6486 -2.5221 0.7659 1.8735 -270.1282 -303.0184 -2.7561 -2.8068
0.0883 1.14 1100 0.5074 -1.3177 -3.3355 0.7540 2.0178 -278.2621 -309.7097 -2.7831 -2.8337
0.086 1.24 1200 0.5001 -1.1250 -3.2622 0.7540 2.1372 -277.5298 -307.7827 -2.7876 -2.8347
0.0919 1.34 1300 0.5054 -1.3872 -3.5531 0.8016 2.1659 -280.4383 -310.4045 -2.7662 -2.8076
0.105 1.44 1400 0.5085 -1.5140 -3.6281 0.7817 2.1141 -281.1881 -311.6723 -2.7877 -2.8291
0.0714 1.55 1500 0.5216 -1.8642 -4.0538 0.7460 2.1896 -285.4451 -315.1745 -2.7888 -2.8331
0.0874 1.65 1600 0.5050 -1.5077 -3.7276 0.7421 2.2199 -282.1837 -311.6096 -2.7751 -2.8315
0.063 1.75 1700 0.5350 -1.9441 -4.4422 0.7857 2.4980 -289.3290 -315.9738 -2.7470 -2.8054
0.0786 1.86 1800 0.5376 -2.0344 -4.4236 0.7698 2.3892 -289.1434 -316.8769 -2.7544 -2.8120
0.1117 1.96 1900 0.5335 -1.9236 -4.0369 0.7817 2.1133 -285.2767 -315.7684 -2.8365 -2.8858
0.0175 2.06 2000 0.5882 -2.8256 -5.7651 0.7619 2.9396 -302.5587 -324.7882 -2.7736 -2.8336
0.0145 2.17 2100 0.6160 -3.1789 -6.2515 0.7659 3.0725 -307.4222 -328.3220 -2.7453 -2.8019
0.0109 2.27 2200 0.6675 -3.8634 -7.3412 0.7659 3.4777 -318.3191 -335.1671 -2.7136 -2.7758
0.0144 2.37 2300 0.6555 -3.6832 -7.0603 0.7738 3.3770 -315.5101 -333.3649 -2.6841 -2.7460
0.0103 2.48 2400 0.6598 -3.7543 -7.1773 0.7579 3.4230 -316.6805 -334.0755 -2.6255 -2.6922
0.0085 2.58 2500 0.7044 -4.5468 -8.3313 0.7659 3.7845 -328.2202 -342.0003 -2.6245 -2.6937
0.0077 2.68 2600 0.6755 -3.9908 -7.6767 0.7857 3.6859 -321.6741 -336.4403 -2.6716 -2.7350
0.0098 2.79 2700 0.6890 -4.1853 -7.8875 0.7778 3.7022 -323.7826 -338.3858 -2.6895 -2.7518
0.0126 2.89 2800 0.6889 -4.2792 -8.0158 0.7778 3.7366 -325.0659 -339.3250 -2.6752 -2.7376
0.0078 2.99 2900 0.6886 -4.3139 -8.0732 0.7738 3.7593 -325.6390 -339.6714 -2.6788 -2.7404

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.0
  • Datasets 2.14.6
  • Tokenizers 0.14.1