--- tags: - trl - dpo - generated_from_trainer model-index: - name: zephyr-7b-dpo-full results: [] --- # zephyr-7b-dpo-full This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set: - Loss: 1.4971 - Rewards/chosen: -4.5102 - Rewards/rejected: -4.6591 - Rewards/accuracies: 0.5156 - Rewards/margins: 0.1490 - Logps/rejected: -753.6738 - Logps/chosen: -732.6489 - Logits/rejected: 1.5926 - Logits/chosen: 1.5057 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 2 - total_train_batch_size: 128 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected | |:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:| | 0.2876 | 0.1 | 100 | -2.3965 | -2.3559 | -391.6134 | -394.6287 | 0.8317 | 0.4883 | -1.0998 | -0.0311 | -1.0687 | | 0.1728 | 0.21 | 200 | -0.2344 | -0.1269 | -464.6779 | -471.8403 | 1.0232 | 0.4766 | -1.8304 | 0.0103 | -1.8408 | | 0.1485 | 0.31 | 300 | -0.3320 | -0.2139 | -506.0840 | -508.1475 | 1.1085 | 0.4883 | -2.2445 | -0.0406 | -2.2039 | | 0.1363 | 0.42 | 400 | -0.2901 | -0.1728 | -477.3530 | -486.5422 | 1.1616 | 0.4961 | -1.9572 | 0.0306 | -1.9878 | | 0.1192 | 0.52 | 500 | 0.8077 | 0.8821 | -553.1240 | -562.3370 | 1.2602 | 0.4961 | -2.7149 | 0.0308 | -2.7458 | | 0.1061 | 0.63 | 600 | 1.3570 | -3.5510 | -3.6801 | 0.5078 | 0.1291 | -655.7740 | -636.7335 | 1.4624 | 1.3499 | | 0.0916 | 0.73 | 700 | 1.5923 | -4.9928 | -5.1535 | 0.5195 | 0.1607 | -803.1144 | -780.9122 | 1.8370 | 1.7244 | | 0.0982 | 0.84 | 800 | 1.4367 | -4.1982 | -4.3446 | 0.5117 | 0.1464 | -722.2228 | -701.4560 | 1.5885 | 1.4960 | | 0.0798 | 0.94 | 900 | 1.4971 | -4.5102 | -4.6591 | 0.5156 | 0.1490 | -753.6738 | -732.6489 | 1.5926 | 1.5057 | ### Framework versions - Transformers 4.38.2 - Pytorch 2.1.2+cu118 - Datasets 2.16.1 - Tokenizers 0.15.2