metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- alignment-handbook
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: zephyr-7b-dpo-full
results: []
zephyr-7b-dpo-full
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.5590
- Rewards/chosen: -0.7818
- Rewards/rejected: -2.7115
- Rewards/accuracies: 0.7857
- Rewards/margins: 1.9297
- Logps/rejected: -287.3273
- Logps/chosen: -289.7805
- Logits/rejected: -2.4561
- Logits/chosen: -2.5007
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6075 | 0.1 | 100 | 0.5945 | 0.3241 | -0.1206 | 0.7163 | 0.4447 | -261.4175 | -278.7209 | -2.6324 | -2.6651 |
0.5341 | 0.21 | 200 | 0.5471 | -0.0734 | -1.0103 | 0.7639 | 0.9369 | -270.3152 | -282.6963 | -2.5394 | -2.5779 |
0.5315 | 0.31 | 300 | 0.5258 | 0.1435 | -0.9757 | 0.7619 | 1.1192 | -269.9694 | -280.5274 | -2.5337 | -2.5711 |
0.4978 | 0.42 | 400 | 0.5366 | -0.2177 | -1.2826 | 0.7579 | 1.0649 | -273.0383 | -284.1391 | -2.5667 | -2.6011 |
0.5134 | 0.52 | 500 | 0.5340 | -0.4713 | -1.5140 | 0.7460 | 1.0427 | -275.3516 | -286.6748 | -2.4488 | -2.4836 |
0.5404 | 0.63 | 600 | 0.5188 | -0.0534 | -1.2981 | 0.7480 | 1.2447 | -273.1928 | -282.4962 | -2.3631 | -2.4039 |
0.5256 | 0.73 | 700 | 0.5270 | -0.2533 | -1.5704 | 0.7639 | 1.3172 | -275.9163 | -284.4948 | -2.3224 | -2.3640 |
0.4991 | 0.84 | 800 | 0.5278 | -0.2394 | -1.5276 | 0.7639 | 1.2882 | -275.4879 | -284.3556 | -2.3730 | -2.4144 |
0.5084 | 0.94 | 900 | 0.5457 | 0.2664 | -0.9546 | 0.7619 | 1.2210 | -269.7581 | -279.2981 | -2.4875 | -2.5254 |
0.1011 | 1.05 | 1000 | 0.5361 | -0.5236 | -2.1364 | 0.7877 | 1.6129 | -281.5762 | -287.1976 | -2.4389 | -2.4774 |
0.0942 | 1.15 | 1100 | 0.5454 | -0.4356 | -2.2047 | 0.7897 | 1.7691 | -282.2592 | -286.3182 | -2.4515 | -2.4926 |
0.0817 | 1.26 | 1200 | 0.5530 | -0.7588 | -2.5855 | 0.7857 | 1.8268 | -286.0674 | -289.5495 | -2.4441 | -2.4863 |
0.0697 | 1.36 | 1300 | 0.5549 | -0.5919 | -2.4690 | 0.7798 | 1.8771 | -284.9021 | -287.8810 | -2.4474 | -2.4910 |
0.0842 | 1.47 | 1400 | 0.5575 | -0.7425 | -2.6443 | 0.7917 | 1.9018 | -286.6550 | -289.3871 | -2.4669 | -2.5100 |
0.075 | 1.57 | 1500 | 0.5590 | -0.5382 | -2.4532 | 0.7956 | 1.9150 | -284.7438 | -287.3436 | -2.4699 | -2.5133 |
0.098 | 1.67 | 1600 | 0.5583 | -0.7761 | -2.6741 | 0.7877 | 1.8980 | -286.9528 | -289.7227 | -2.4652 | -2.5092 |
0.0718 | 1.78 | 1700 | 0.5593 | -0.7532 | -2.6704 | 0.7877 | 1.9172 | -286.9160 | -289.4940 | -2.4592 | -2.5036 |
0.0828 | 1.88 | 1800 | 0.5606 | -0.7985 | -2.7306 | 0.7897 | 1.9321 | -287.5178 | -289.9467 | -2.4560 | -2.5007 |
0.103 | 1.99 | 1900 | 0.5601 | -0.7805 | -2.7113 | 0.7857 | 1.9309 | -287.3255 | -289.7666 | -2.4554 | -2.5002 |
Framework versions
- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.14.6
- Tokenizers 0.15.2