--- tags: - trl - dpo - generated_from_trainer model-index: - name: zephyr-7b-dpo-full results: [] --- # zephyr-7b-dpo-full This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.5418 - Rewards/chosen: -3.1726 - Rewards/rejected: -4.7390 - Rewards/accuracies: 0.7539 - Rewards/margins: 1.5664 - Logps/rejected: -761.6608 - Logps/chosen: -598.8974 - Logits/rejected: 0.2389 - Logits/chosen: -0.0634 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 2 - total_train_batch_size: 128 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6142 | 0.07 | 100 | 0.6372 | -0.2112 | -0.4255 | 0.6992 | 0.2143 | -330.3116 | -302.7545 | -1.7521 | -1.7871 | | 0.4726 | 0.15 | 200 | 0.5516 | -1.3441 | -2.1046 | 0.75 | 0.7605 | -498.2208 | -416.0410 | -2.0018 | -2.0471 | | 0.4421 | 0.22 | 300 | 0.5335 | -1.1470 | -2.0463 | 0.7539 | 0.8992 | -492.3901 | -396.3379 | -1.7522 | -1.8325 | | 0.3828 | 0.3 | 400 | 0.5238 | -1.6652 | -2.7617 | 0.7695 | 1.0965 | -563.9280 | -448.1488 | -0.9530 | -1.1204 | | 0.3576 | 0.37 | 500 | 0.5184 | -1.6238 | -2.7277 | 0.7695 | 1.1039 | -560.5328 | -444.0173 | -0.8922 | -1.1202 | | 0.3328 | 0.45 | 600 | 0.5151 | -2.1202 | -3.4092 | 0.7656 | 1.2890 | -628.6859 | -493.6552 | 0.2423 | -0.0694 | | 0.3131 | 0.52 | 700 | 0.5153 | -1.7034 | -2.9038 | 0.7656 | 1.2004 | -578.1398 | -451.9696 | 0.1729 | -0.1656 | | 0.2547 | 0.59 | 800 | 0.5256 | -2.5366 | -3.8570 | 0.7617 | 1.3204 | -673.4565 | -535.2915 | 0.4476 | 0.1270 | | 0.2764 | 0.67 | 900 | 0.5221 | -2.5675 | -3.9457 | 0.7773 | 1.3782 | -682.3342 | -538.3813 | 0.0520 | -0.2431 | | 0.2261 | 0.74 | 1000 | 0.5298 | -2.7657 | -4.2499 | 0.7695 | 1.4842 | -712.7483 | -558.2006 | 0.2023 | -0.1104 | | 0.2219 | 0.82 | 1100 | 0.5380 | -3.0986 | -4.6646 | 0.7695 | 1.5660 | -754.2211 | -591.4904 | 0.3078 | -0.0067 | | 0.2165 | 0.89 | 1200 | 0.5336 | -2.9855 | -4.5026 | 0.7617 | 1.5170 | -738.0179 | -580.1855 | 0.2015 | -0.0980 | | 0.1728 | 0.97 | 1300 | 0.5418 | -3.1726 | -4.7390 | 0.7539 | 1.5664 | -761.6608 | -598.8974 | 0.2389 | -0.0634 | ### Framework versions - Transformers 4.38.2 - Pytorch 2.1.2+cu118 - Datasets 2.16.1 - Tokenizers 0.15.2