metadata
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-full
results: []
zephyr-7b-dpo-full
This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.5418
- Rewards/chosen: -3.1726
- Rewards/rejected: -4.7390
- Rewards/accuracies: 0.7539
- Rewards/margins: 1.5664
- Logps/rejected: -761.6608
- Logps/chosen: -598.8974
- Logits/rejected: 0.2389
- Logits/chosen: -0.0634
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6142 | 0.07 | 100 | 0.6372 | -0.2112 | -0.4255 | 0.6992 | 0.2143 | -330.3116 | -302.7545 | -1.7521 | -1.7871 |
0.4726 | 0.15 | 200 | 0.5516 | -1.3441 | -2.1046 | 0.75 | 0.7605 | -498.2208 | -416.0410 | -2.0018 | -2.0471 |
0.4421 | 0.22 | 300 | 0.5335 | -1.1470 | -2.0463 | 0.7539 | 0.8992 | -492.3901 | -396.3379 | -1.7522 | -1.8325 |
0.3828 | 0.3 | 400 | 0.5238 | -1.6652 | -2.7617 | 0.7695 | 1.0965 | -563.9280 | -448.1488 | -0.9530 | -1.1204 |
0.3576 | 0.37 | 500 | 0.5184 | -1.6238 | -2.7277 | 0.7695 | 1.1039 | -560.5328 | -444.0173 | -0.8922 | -1.1202 |
0.3328 | 0.45 | 600 | 0.5151 | -2.1202 | -3.4092 | 0.7656 | 1.2890 | -628.6859 | -493.6552 | 0.2423 | -0.0694 |
0.3131 | 0.52 | 700 | 0.5153 | -1.7034 | -2.9038 | 0.7656 | 1.2004 | -578.1398 | -451.9696 | 0.1729 | -0.1656 |
0.2547 | 0.59 | 800 | 0.5256 | -2.5366 | -3.8570 | 0.7617 | 1.3204 | -673.4565 | -535.2915 | 0.4476 | 0.1270 |
0.2764 | 0.67 | 900 | 0.5221 | -2.5675 | -3.9457 | 0.7773 | 1.3782 | -682.3342 | -538.3813 | 0.0520 | -0.2431 |
0.2261 | 0.74 | 1000 | 0.5298 | -2.7657 | -4.2499 | 0.7695 | 1.4842 | -712.7483 | -558.2006 | 0.2023 | -0.1104 |
0.2219 | 0.82 | 1100 | 0.5380 | -3.0986 | -4.6646 | 0.7695 | 1.5660 | -754.2211 | -591.4904 | 0.3078 | -0.0067 |
0.2165 | 0.89 | 1200 | 0.5336 | -2.9855 | -4.5026 | 0.7617 | 1.5170 | -738.0179 | -580.1855 | 0.2015 | -0.0980 |
0.1728 | 0.97 | 1300 | 0.5418 | -3.1726 | -4.7390 | 0.7539 | 1.5664 | -761.6608 | -598.8974 | 0.2389 | -0.0634 |
Framework versions
- Transformers 4.38.2
- Pytorch 2.1.2+cu118
- Datasets 2.16.1
- Tokenizers 0.15.2