--- license: apache-2.0 base_model: nnheui/pythia-1.4b-sft-full tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - alignment-handbook - generated_from_trainer datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: pythia-1.4b-dpo-full results: [] --- # pythia-1.4b-dpo-full This model is a fine-tuned version of [nnheui/pythia-1.4b-sft-full](https://huggingface.co/nnheui/pythia-1.4b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.5991 - Rewards/chosen: -1.875 - Rewards/rejected: -2.6406 - Rewards/accuracies: 0.7164 - Rewards/margins: 0.7734 - Logps/rejected: -604.0 - Logps/chosen: -580.0 - Logits/rejected: -1.4297 - Logits/chosen: -1.4062 - Logps/chosen Top Tokens: -0.0009 - Logps/rejected Top Tokens: -0.0009 - Logps/chosen Bottom Tokens: -13.9375 - Logps/rejected Bottom Tokens: -13.8125 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 5 - eval_batch_size: 5 - seed: 42 - distributed_type: multi-GPU - num_devices: 6 - gradient_accumulation_steps: 4 - total_train_batch_size: 120 - total_eval_batch_size: 30 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 10 ### Training results | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/chosen Bottom Tokens | Logps/chosen Top Tokens | Logps/rejected | Logps/rejected Bottom Tokens | Logps/rejected Top Tokens | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected | |:-------------:|:------:|:----:|:-------------:|:---------------:|:------------:|:--------------------------:|:-----------------------:|:--------------:|:----------------------------:|:-------------------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:| | 0.678 | 0.1963 | 100 | -1.0938 | -1.1562 | -396.0 | -14.0625 | -0.0009 | -344.0 | -14.0 | -0.0009 | 0.6789 | 0.5881 | -0.0275 | 0.0332 | -0.0608 | | 0.645 | 0.3925 | 200 | -1.1562 | -1.2031 | -422.0 | -14.375 | -0.0009 | -380.0 | -14.3125 | -0.0009 | 0.6489 | 0.6448 | -0.2871 | 0.1367 | -0.4238 | | 0.6396 | 0.5888 | 300 | -1.1875 | -1.2344 | -438.0 | -14.375 | -0.0007 | -406.0 | -14.3125 | -0.0008 | 0.6304 | 0.6627 | -0.4512 | 0.2275 | -0.6797 | | 0.6102 | 0.7851 | 400 | -1.1875 | -1.2344 | -444.0 | -14.3125 | -0.0007 | -414.0 | -14.25 | -0.0007 | 0.6268 | 0.6567 | -0.5039 | 0.2578 | -0.7617 | | 0.6084 | 0.9814 | 500 | -1.1953 | -1.2422 | -446.0 | -14.375 | -0.0007 | -416.0 | -14.3125 | -0.0007 | 0.6259 | 0.6567 | -0.5234 | 0.2617 | -0.7852 | | 0.6115 | 1.1776 | 600 | 0.6121 | -0.5547 | -0.8789 | 0.6806 | 0.3242 | -426.0 | -450.0 | -1.2578 | -1.2109 | -0.0006 | -0.0006 | -14.25 | -14.125 | | 0.607 | 1.3739 | 700 | 0.6068 | -0.6641 | -1.0078 | 0.6985 | 0.3418 | -438.0 | -460.0 | -1.2812 | -1.2344 | -0.0006 | -0.0006 | -14.1875 | -14.125 | | 0.5764 | 1.5702 | 800 | 0.5996 | -0.75 | -1.1406 | 0.6866 | 0.3887 | -452.0 | -468.0 | -1.3125 | -1.2656 | -0.0007 | -0.0007 | -14.25 | -14.125 | | 0.5903 | 1.7664 | 900 | 0.5984 | -0.5898 | -0.9648 | 0.7045 | 0.3770 | -434.0 | -452.0 | -1.3125 | -1.2656 | -0.0006 | -0.0006 | -14.25 | -14.125 | | 0.5697 | 1.9627 | 1000 | 0.5922 | -0.7383 | -1.1562 | 0.6866 | 0.4160 | -454.0 | -468.0 | -1.3125 | -1.2734 | -0.0007 | -0.0006 | -14.0625 | -14.0 | | 0.5573 | 2.1590 | 1100 | 0.5854 | -0.8203 | -1.2812 | 0.6985 | 0.4570 | -466.0 | -476.0 | -1.3281 | -1.2891 | -0.0006 | -0.0006 | -14.125 | -14.0 | | 0.5439 | 2.3553 | 1200 | 0.5845 | -1.1016 | -1.6172 | 0.6866 | 0.5078 | -498.0 | -504.0 | -1.3672 | -1.3281 | -0.0007 | -0.0006 | -14.0625 | -13.9375 | | 0.5487 | 2.5515 | 1300 | 0.5801 | -0.8906 | -1.3828 | 0.6925 | 0.4980 | -476.0 | -482.0 | -1.3828 | -1.3438 | -0.0007 | -0.0006 | -14.0625 | -14.0 | | 0.543 | 2.7478 | 1400 | 0.5785 | -0.8672 | -1.3516 | 0.7134 | 0.4863 | -474.0 | -480.0 | -1.375 | -1.3359 | -0.0007 | -0.0006 | -14.0625 | -13.9375 | | 0.5382 | 2.9441 | 1500 | 0.5711 | -1.1172 | -1.6641 | 0.6955 | 0.5508 | -504.0 | -506.0 | -1.3906 | -1.3516 | -0.0007 | -0.0006 | -14.125 | -14.0 | | 0.5117 | 3.1403 | 1600 | 0.5712 | -1.25 | -1.8281 | 0.7045 | 0.5742 | -520.0 | -520.0 | -1.3984 | -1.3594 | -0.0007 | -0.0006 | -14.125 | -14.0 | | 0.4983 | 3.3366 | 1700 | 0.5703 | -1.1641 | -1.75 | 0.7015 | 0.5859 | -512.0 | -510.0 | -1.4062 | -1.3672 | -0.0007 | -0.0007 | -14.125 | -14.0 | | 0.4976 | 3.5329 | 1800 | 0.5709 | -1.2656 | -1.8828 | 0.7254 | 0.6133 | -524.0 | -520.0 | -1.4141 | -1.375 | -0.0007 | -0.0007 | -14.125 | -14.0625 | | 0.4956 | 3.7291 | 1900 | 0.5754 | -1.2266 | -1.8047 | 0.7164 | 0.5781 | -516.0 | -516.0 | -1.4062 | -1.3672 | -0.0008 | -0.0008 | -14.0625 | -13.9375 | | 0.4996 | 3.9254 | 2000 | 0.5722 | -1.2578 | -1.8516 | 0.7045 | 0.6016 | -524.0 | -520.0 | -1.4062 | -1.375 | -0.0008 | -0.0008 | -14.0625 | -13.9375 | | 0.4588 | 4.1217 | 2100 | 0.5748 | -1.4141 | -2.0312 | 0.7343 | 0.6211 | -540.0 | -536.0 | -1.4062 | -1.375 | -0.0009 | -0.0009 | -14.0 | -13.875 | | 0.4555 | 4.3180 | 2200 | 0.5743 | -1.2969 | -1.9141 | 0.7164 | 0.6172 | -528.0 | -524.0 | -1.4219 | -1.3906 | -0.0009 | -0.0009 | -13.9375 | -13.8125 | | 0.4625 | 4.5142 | 2300 | 0.5735 | -1.3047 | -1.9297 | 0.7134 | 0.625 | -532.0 | -524.0 | -1.4141 | -1.3828 | -0.0008 | -0.0008 | -14.0 | -13.875 | | 0.469 | 4.7105 | 2400 | 0.5743 | -1.4766 | -2.1406 | 0.7194 | 0.6562 | -552.0 | -540.0 | -1.4375 | -1.3984 | -0.0009 | -0.0009 | -14.0 | -13.875 | | 0.4796 | 4.9068 | 2500 | 0.5750 | -1.3281 | -1.9766 | 0.7134 | 0.6484 | -536.0 | -528.0 | -1.4375 | -1.3984 | -0.0009 | -0.0009 | -14.0 | -13.875 | | 0.4082 | 5.1030 | 2600 | 0.5818 | -1.6016 | -2.2656 | 0.7194 | 0.6602 | -564.0 | -552.0 | -1.4453 | -1.4062 | -0.0009 | -0.0009 | -14.0 | -13.875 | | 0.4193 | 5.2993 | 2700 | 0.5803 | -1.4922 | -2.1406 | 0.7194 | 0.6523 | -552.0 | -544.0 | -1.4375 | -1.3984 | -0.0009 | -0.0009 | -14.0 | -13.8125 | | 0.419 | 5.4956 | 2800 | 0.5795 | -1.625 | -2.3281 | 0.7194 | 0.7031 | -572.0 | -556.0 | -1.4375 | -1.3984 | -0.0009 | -0.0009 | -14.0 | -13.875 | | 0.4267 | 5.6919 | 2900 | 0.5780 | -1.6875 | -2.375 | 0.7134 | 0.6836 | -576.0 | -564.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -13.9375 | -13.8125 | | 0.402 | 5.8881 | 3000 | 0.5828 | -1.6484 | -2.3594 | 0.7254 | 0.7109 | -572.0 | -560.0 | -1.4453 | -1.4062 | -0.0009 | -0.0009 | -13.9375 | -13.8125 | | 0.3656 | 6.0844 | 3100 | 0.5844 | -1.6875 | -2.4062 | 0.7015 | 0.7227 | -580.0 | -564.0 | -1.4375 | -1.4062 | -0.0009 | -0.0009 | -14.0 | -13.875 | | 0.3971 | 6.2807 | 3200 | 0.5873 | -1.6094 | -2.3281 | 0.7075 | 0.7148 | -572.0 | -556.0 | -1.4453 | -1.4141 | -0.0009 | -0.0009 | -14.0 | -13.8125 | | 0.3923 | 6.4769 | 3300 | 0.5906 | -1.6875 | -2.4062 | 0.7075 | 0.7188 | -580.0 | -564.0 | -1.4453 | -1.4141 | -0.0009 | -0.0009 | -14.0 | -13.875 | | 0.4011 | 6.6732 | 3400 | 0.5848 | -1.7109 | -2.4375 | 0.7254 | 0.7344 | -584.0 | -564.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -14.0 | -13.875 | | 0.3838 | 6.8695 | 3500 | 0.5897 | -1.75 | -2.4844 | 0.7164 | 0.7305 | -584.0 | -568.0 | -1.4297 | -1.3984 | -0.0009 | -0.0008 | -13.9375 | -13.8125 | | 0.3762 | 7.0658 | 3600 | 0.5910 | -1.7812 | -2.5312 | 0.7134 | 0.7422 | -592.0 | -572.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -13.9375 | -13.8125 | | 0.3591 | 7.2620 | 3700 | 0.5895 | -1.7812 | -2.5312 | 0.7075 | 0.7578 | -592.0 | -572.0 | -1.4375 | -1.4062 | -0.0009 | -0.0009 | -14.0 | -13.875 | | 0.3713 | 7.4583 | 3800 | 0.5956 | -1.7734 | -2.5312 | 0.7164 | 0.75 | -592.0 | -572.0 | -1.4297 | -1.3984 | -0.0009 | -0.0009 | -13.9375 | -13.8125 | | 0.381 | 7.6546 | 3900 | 0.5948 | -1.8672 | -2.625 | 0.7164 | 0.7695 | -600.0 | -580.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -13.9375 | -13.8125 | | 0.3639 | 7.8508 | 4000 | 0.5950 | -1.8672 | -2.625 | 0.7194 | 0.7578 | -600.0 | -580.0 | -1.4375 | -1.4062 | -0.0009 | -0.0009 | -13.9375 | -13.8125 | | 0.3563 | 8.0471 | 4100 | 0.5939 | -1.8281 | -2.5781 | 0.7075 | 0.7539 | -596.0 | -576.0 | -1.4297 | -1.3984 | -0.0009 | -0.0009 | -13.9375 | -13.8125 | | 0.3484 | 8.2434 | 4200 | 0.5969 | -1.875 | -2.6406 | 0.7045 | 0.7656 | -600.0 | -580.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -14.0 | -13.875 | | 0.3359 | 8.4396 | 4300 | 0.5966 | -1.8828 | -2.6562 | 0.7045 | 0.7734 | -604.0 | -580.0 | -1.4375 | -1.4062 | -0.0009 | -0.0009 | -13.9375 | -13.8125 | | 0.3639 | 8.6359 | 4400 | 0.5979 | -1.8516 | -2.5938 | 0.7075 | 0.7461 | -596.0 | -580.0 | -1.4297 | -1.3984 | -0.0009 | -0.0009 | -13.9375 | -13.8125 | | 0.3563 | 8.8322 | 4500 | 0.5979 | -1.8594 | -2.625 | 0.7075 | 0.7617 | -600.0 | -580.0 | -1.4297 | -1.3984 | -0.0009 | -0.0009 | -13.9375 | -13.8125 | | 0.353 | 9.0285 | 4600 | 0.5981 | -1.8672 | -2.625 | 0.6985 | 0.7617 | -600.0 | -580.0 | -1.4297 | -1.3984 | -0.0009 | -0.0008 | -13.9375 | -13.8125 | | 0.3514 | 9.2247 | 4700 | 0.5979 | -1.8594 | -2.625 | 0.6985 | 0.7656 | -600.0 | -580.0 | -1.4297 | -1.3984 | -0.0009 | -0.0008 | -13.9375 | -13.8125 | | 0.3434 | 9.4210 | 4800 | 0.5973 | -1.8672 | -2.6406 | 0.7015 | 0.7656 | -600.0 | -580.0 | -1.4297 | -1.4062 | -0.0009 | -0.0008 | -13.9375 | -13.8125 | | 0.3492 | 9.6173 | 4900 | 0.5981 | -1.875 | -2.6406 | 0.7045 | 0.7578 | -600.0 | -580.0 | -1.4297 | -1.3984 | -0.0009 | -0.0008 | -13.9375 | -13.8125 | | 0.3487 | 9.8135 | 5000 | 0.5967 | -1.8672 | -2.6406 | 0.7134 | 0.7734 | -600.0 | -580.0 | -1.4375 | -1.4062 | -0.0009 | -0.0008 | -13.9375 | -13.8125 | ### Framework versions - Transformers 4.40.0 - Pytorch 2.2.2+cu121 - Datasets 2.19.0 - Tokenizers 0.19.1