metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-full
results: []
zephyr-7b-dpo-full
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6874
- Rewards/chosen: -4.3150
- Rewards/rejected: -8.0704
- Rewards/accuracies: 0.7857
- Rewards/margins: 3.7554
- Logps/rejected: -325.6119
- Logps/chosen: -339.6828
- Logits/rejected: -2.6781
- Logits/chosen: -2.7397
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.5397 | 0.1 | 100 | 0.5211 | 0.1287 | -0.6851 | 0.7579 | 0.8138 | -251.7586 | -295.2458 | -2.9742 | -3.0033 |
0.4919 | 0.21 | 200 | 0.4873 | 0.0278 | -1.1599 | 0.7897 | 1.1876 | -256.5061 | -296.2552 | -3.0688 | -3.0898 |
0.4802 | 0.31 | 300 | 0.5027 | -0.2234 | -1.3257 | 0.7540 | 1.1023 | -258.1646 | -298.7669 | -3.0494 | -3.0828 |
0.5134 | 0.41 | 400 | 0.5098 | -0.2878 | -1.6709 | 0.7698 | 1.3832 | -261.6169 | -299.4102 | -2.8843 | -2.9179 |
0.4534 | 0.52 | 500 | 0.4905 | -0.1808 | -1.6336 | 0.7698 | 1.4528 | -261.2433 | -298.3406 | -2.9804 | -3.0182 |
0.4976 | 0.62 | 600 | 0.4872 | -0.2273 | -1.5386 | 0.7659 | 1.3112 | -260.2931 | -298.8059 | -2.9266 | -2.9730 |
0.5452 | 0.72 | 700 | 0.4888 | -0.4813 | -1.6851 | 0.7341 | 1.2039 | -261.7586 | -301.3452 | -2.9377 | -2.9686 |
0.5342 | 0.83 | 800 | 0.4774 | -0.3705 | -1.9222 | 0.7857 | 1.5517 | -264.1292 | -300.2377 | -2.8434 | -2.8821 |
0.5014 | 0.93 | 900 | 0.4814 | -0.2397 | -1.6794 | 0.7619 | 1.4397 | -261.7013 | -298.9296 | -2.8339 | -2.8781 |
0.0785 | 1.03 | 1000 | 0.4821 | -0.6486 | -2.5221 | 0.7659 | 1.8735 | -270.1282 | -303.0184 | -2.7561 | -2.8068 |
0.0883 | 1.14 | 1100 | 0.5074 | -1.3177 | -3.3355 | 0.7540 | 2.0178 | -278.2621 | -309.7097 | -2.7831 | -2.8337 |
0.086 | 1.24 | 1200 | 0.5001 | -1.1250 | -3.2622 | 0.7540 | 2.1372 | -277.5298 | -307.7827 | -2.7876 | -2.8347 |
0.0919 | 1.34 | 1300 | 0.5054 | -1.3872 | -3.5531 | 0.8016 | 2.1659 | -280.4383 | -310.4045 | -2.7662 | -2.8076 |
0.105 | 1.44 | 1400 | 0.5085 | -1.5140 | -3.6281 | 0.7817 | 2.1141 | -281.1881 | -311.6723 | -2.7877 | -2.8291 |
0.0714 | 1.55 | 1500 | 0.5216 | -1.8642 | -4.0538 | 0.7460 | 2.1896 | -285.4451 | -315.1745 | -2.7888 | -2.8331 |
0.0874 | 1.65 | 1600 | 0.5050 | -1.5077 | -3.7276 | 0.7421 | 2.2199 | -282.1837 | -311.6096 | -2.7751 | -2.8315 |
0.063 | 1.75 | 1700 | 0.5350 | -1.9441 | -4.4422 | 0.7857 | 2.4980 | -289.3290 | -315.9738 | -2.7470 | -2.8054 |
0.0786 | 1.86 | 1800 | 0.5376 | -2.0344 | -4.4236 | 0.7698 | 2.3892 | -289.1434 | -316.8769 | -2.7544 | -2.8120 |
0.1117 | 1.96 | 1900 | 0.5335 | -1.9236 | -4.0369 | 0.7817 | 2.1133 | -285.2767 | -315.7684 | -2.8365 | -2.8858 |
0.0175 | 2.06 | 2000 | 0.5882 | -2.8256 | -5.7651 | 0.7619 | 2.9396 | -302.5587 | -324.7882 | -2.7736 | -2.8336 |
0.0145 | 2.17 | 2100 | 0.6160 | -3.1789 | -6.2515 | 0.7659 | 3.0725 | -307.4222 | -328.3220 | -2.7453 | -2.8019 |
0.0109 | 2.27 | 2200 | 0.6675 | -3.8634 | -7.3412 | 0.7659 | 3.4777 | -318.3191 | -335.1671 | -2.7136 | -2.7758 |
0.0144 | 2.37 | 2300 | 0.6555 | -3.6832 | -7.0603 | 0.7738 | 3.3770 | -315.5101 | -333.3649 | -2.6841 | -2.7460 |
0.0103 | 2.48 | 2400 | 0.6598 | -3.7543 | -7.1773 | 0.7579 | 3.4230 | -316.6805 | -334.0755 | -2.6255 | -2.6922 |
0.0085 | 2.58 | 2500 | 0.7044 | -4.5468 | -8.3313 | 0.7659 | 3.7845 | -328.2202 | -342.0003 | -2.6245 | -2.6937 |
0.0077 | 2.68 | 2600 | 0.6755 | -3.9908 | -7.6767 | 0.7857 | 3.6859 | -321.6741 | -336.4403 | -2.6716 | -2.7350 |
0.0098 | 2.79 | 2700 | 0.6890 | -4.1853 | -7.8875 | 0.7778 | 3.7022 | -323.7826 | -338.3858 | -2.6895 | -2.7518 |
0.0126 | 2.89 | 2800 | 0.6889 | -4.2792 | -8.0158 | 0.7778 | 3.7366 | -325.0659 | -339.3250 | -2.6752 | -2.7376 |
0.0078 | 2.99 | 2900 | 0.6886 | -4.3139 | -8.0732 | 0.7738 | 3.7593 | -325.6390 | -339.6714 | -2.6788 | -2.7404 |
Framework versions
- Transformers 4.35.0
- Pytorch 2.1.0
- Datasets 2.14.6
- Tokenizers 0.14.1