--- tags: - trl - dpo - generated_from_trainer model-index: - name: zephyr-7b-dpo-full results: [] --- # zephyr-7b-dpo-full This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set: - Loss: 2.0232 - Rewards/chosen: -7.1946 - Rewards/rejected: -8.7238 - Rewards/accuracies: 0.6133 - Rewards/margins: 1.5292 - Logps/rejected: -1160.1461 - Logps/chosen: -1001.0963 - Logits/rejected: -0.4190 - Logits/chosen: -0.5892 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 2 - total_train_batch_size: 128 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.254 | 0.1 | 100 | 1.4761 | -5.3290 | -6.2112 | 0.5898 | 0.8822 | -908.8818 | -814.5385 | -1.4873 | -1.5203 | | 0.1844 | 0.21 | 200 | 1.7253 | -6.1555 | -7.4481 | 0.6133 | 1.2926 | -1032.5726 | -897.1824 | -1.4103 | -1.4664 | | 0.1635 | 0.31 | 300 | 1.6677 | -6.1768 | -7.3921 | 0.5938 | 1.2153 | -1026.9750 | -899.3143 | -0.6257 | -0.7515 | | 0.1606 | 0.42 | 400 | 2.0307 | -7.0774 | -8.4601 | 0.6016 | 1.3827 | -1133.7700 | -989.3718 | -0.4798 | -0.6143 | | 0.163 | 0.52 | 500 | 1.8216 | -6.5495 | -8.0368 | 0.5898 | 1.4873 | -1091.4379 | -936.5793 | -0.8136 | -0.9380 | | 0.1656 | 0.63 | 600 | 1.8091 | -6.5309 | -7.9593 | 0.625 | 1.4284 | -1083.6920 | -934.7285 | -0.3700 | -0.5360 | | 0.1552 | 0.73 | 700 | 2.0767 | -7.6318 | -9.1866 | 0.5977 | 1.5547 | -1206.4197 | -1044.8179 | -0.3588 | -0.5351 | | 0.1377 | 0.84 | 800 | 2.0307 | -7.2870 | -8.8043 | 0.6055 | 1.5173 | -1168.1901 | -1010.3356 | -0.4156 | -0.5887 | | 0.1462 | 0.94 | 900 | 2.0232 | -7.1946 | -8.7238 | 0.6133 | 1.5292 | -1160.1461 | -1001.0963 | -0.4190 | -0.5892 | ### Framework versions - Transformers 4.38.2 - Pytorch 2.1.2+cu118 - Datasets 2.16.1 - Tokenizers 0.15.2