--- base_model: microsoft/phi-1_5 tags: - alignment-handbook - generated_from_trainer datasets: - argilla/ultrafeedback-binarized-preferences-cleaned model-index: - name: phi_1_5_dpo_ep6 results: [] --- # phi_1_5_dpo_ep6 This model is a fine-tuned version of [/home/work/saic-llm-2023/checkpoints/microsoft/phi-1_5](https://huggingface.co//home/work/saic-llm-2023/checkpoints/microsoft/phi-1_5) on the argilla/ultrafeedback-binarized-preferences-cleaned dataset. It achieves the following results on the evaluation set: - Loss: 0.4748 - Rewards/chosen: -0.9135 - Rewards/rejected: -1.9448 - Rewards/accuracies: 0.7937 - Rewards/margins: 1.0313 - Logps/rejected: -618.5530 - Logps/chosen: -634.6866 - Logits/rejected: 3.4318 - Logits/chosen: 3.4052 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - num_epochs: 6 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6881 | 0.11 | 100 | 0.6856 | 0.0468 | 0.0298 | 0.7024 | 0.0170 | -421.0949 | -538.6564 | 4.8883 | 4.6646 | | 0.6692 | 0.22 | 200 | 0.6642 | 0.1742 | 0.0988 | 0.7123 | 0.0754 | -414.1955 | -525.9189 | 4.8718 | 4.6370 | | 0.6368 | 0.33 | 300 | 0.6442 | 0.2557 | 0.1261 | 0.7083 | 0.1296 | -411.4657 | -517.7680 | 4.8407 | 4.5968 | | 0.6283 | 0.43 | 400 | 0.6283 | 0.2608 | 0.0812 | 0.7083 | 0.1795 | -415.9522 | -517.2609 | 4.7629 | 4.5156 | | 0.6052 | 0.54 | 500 | 0.6132 | 0.1429 | -0.0998 | 0.7103 | 0.2427 | -434.0545 | -529.0491 | 4.5516 | 4.3153 | | 0.5923 | 0.65 | 600 | 0.6008 | 0.1425 | -0.1628 | 0.7123 | 0.3053 | -440.3539 | -529.0887 | 4.4588 | 4.2289 | | 0.5899 | 0.76 | 700 | 0.5880 | 0.0755 | -0.2915 | 0.7083 | 0.3670 | -453.2271 | -535.7857 | 4.3444 | 4.1349 | | 0.558 | 0.87 | 800 | 0.5715 | -0.0965 | -0.5304 | 0.7262 | 0.4339 | -477.1144 | -552.9822 | 4.2704 | 4.0642 | | 0.5495 | 0.98 | 900 | 0.5552 | -0.2658 | -0.7677 | 0.7341 | 0.5019 | -500.8484 | -569.9210 | 4.1976 | 4.0015 | | 0.5124 | 1.09 | 1000 | 0.5473 | -0.3871 | -0.9394 | 0.7321 | 0.5523 | -518.0129 | -582.0427 | 4.0959 | 3.9125 | | 0.5322 | 1.19 | 1100 | 0.5400 | -0.3641 | -0.9463 | 0.7579 | 0.5821 | -518.7011 | -579.7518 | 4.0436 | 3.8715 | | 0.5281 | 1.3 | 1200 | 0.5344 | -0.5340 | -1.1498 | 0.7460 | 0.6158 | -539.0579 | -596.7365 | 3.9368 | 3.7842 | | 0.5063 | 1.41 | 1300 | 0.5297 | -0.3754 | -0.9975 | 0.7579 | 0.6221 | -523.8221 | -580.8731 | 4.0135 | 3.8499 | | 0.5073 | 1.52 | 1400 | 0.5216 | -0.3819 | -1.0300 | 0.7758 | 0.6481 | -527.0738 | -581.5236 | 3.9401 | 3.7846 | | 0.5156 | 1.63 | 1500 | 0.5177 | -0.5748 | -1.2824 | 0.7560 | 0.7077 | -552.3166 | -600.8123 | 3.7868 | 3.6678 | | 0.5072 | 1.74 | 1600 | 0.5138 | -0.4973 | -1.2122 | 0.7798 | 0.7149 | -545.2914 | -593.0637 | 3.7791 | 3.6614 | | 0.4908 | 1.85 | 1700 | 0.5077 | -0.5479 | -1.2972 | 0.7798 | 0.7493 | -553.7918 | -598.1292 | 3.7893 | 3.6696 | | 0.5109 | 1.95 | 1800 | 0.5068 | -0.6157 | -1.3930 | 0.7758 | 0.7773 | -563.3733 | -604.9089 | 3.7679 | 3.6556 | | 0.4779 | 2.06 | 1900 | 0.5005 | -0.6247 | -1.4169 | 0.7738 | 0.7922 | -565.7673 | -605.8088 | 3.7118 | 3.6062 | | 0.4833 | 2.17 | 2000 | 0.4992 | -0.6841 | -1.5026 | 0.7698 | 0.8185 | -574.3334 | -611.7432 | 3.6739 | 3.5849 | | 0.4879 | 2.28 | 2100 | 0.4967 | -0.8128 | -1.6654 | 0.7698 | 0.8526 | -590.6146 | -624.6127 | 3.5692 | 3.5030 | | 0.4645 | 2.39 | 2200 | 0.4927 | -0.6969 | -1.5365 | 0.7857 | 0.8396 | -577.7230 | -613.0289 | 3.6647 | 3.5772 | | 0.4587 | 2.5 | 2300 | 0.4936 | -0.6024 | -1.4533 | 0.7778 | 0.8509 | -569.4068 | -603.5743 | 3.6615 | 3.5790 | | 0.437 | 2.61 | 2400 | 0.4921 | -0.8826 | -1.7724 | 0.7738 | 0.8897 | -601.3099 | -631.5984 | 3.4903 | 3.4343 | | 0.4204 | 2.71 | 2500 | 0.4890 | -0.8338 | -1.7338 | 0.7758 | 0.8999 | -597.4498 | -626.7175 | 3.5447 | 3.4804 | | 0.467 | 2.82 | 2600 | 0.4865 | -0.5910 | -1.4516 | 0.7877 | 0.8606 | -569.2333 | -602.4326 | 3.5690 | 3.5000 | | 0.458 | 2.93 | 2700 | 0.4861 | -0.7666 | -1.6726 | 0.7837 | 0.9059 | -591.3298 | -620.0014 | 3.5208 | 3.4579 | | 0.462 | 3.04 | 2800 | 0.4844 | -0.7109 | -1.6145 | 0.7917 | 0.9037 | -585.5269 | -614.4227 | 3.5553 | 3.4954 | | 0.4258 | 3.15 | 2900 | 0.4888 | -0.9814 | -1.9414 | 0.7817 | 0.9600 | -618.2142 | -641.4772 | 3.4761 | 3.4227 | | 0.4219 | 3.26 | 3000 | 0.4856 | -0.8858 | -1.8323 | 0.7937 | 0.9465 | -607.3071 | -631.9181 | 3.4895 | 3.4362 | | 0.4295 | 3.37 | 3100 | 0.4823 | -0.8140 | -1.7651 | 0.7976 | 0.9511 | -600.5797 | -624.7327 | 3.4880 | 3.4357 | | 0.4268 | 3.47 | 3200 | 0.4800 | -0.8592 | -1.8282 | 0.7976 | 0.9690 | -606.8929 | -629.2567 | 3.4536 | 3.4126 | | 0.4338 | 3.58 | 3300 | 0.4785 | -0.8784 | -1.8458 | 0.7956 | 0.9674 | -608.6551 | -631.1731 | 3.4471 | 3.4096 | | 0.4297 | 3.69 | 3400 | 0.4774 | -0.9026 | -1.8929 | 0.7956 | 0.9903 | -613.3634 | -633.5962 | 3.4710 | 3.4326 | | 0.4133 | 3.8 | 3500 | 0.4785 | -0.9173 | -1.9072 | 0.7937 | 0.9899 | -614.7964 | -635.0674 | 3.4610 | 3.4232 | | 0.4275 | 3.91 | 3600 | 0.4794 | -1.0209 | -2.0380 | 0.7837 | 1.0171 | -627.8748 | -645.4227 | 3.4635 | 3.4227 | | 0.4224 | 4.02 | 3700 | 0.4784 | -0.9130 | -1.9086 | 0.7937 | 0.9955 | -614.9320 | -634.6396 | 3.4812 | 3.4400 | | 0.4101 | 4.13 | 3800 | 0.4773 | -0.9474 | -1.9571 | 0.7877 | 1.0097 | -619.7819 | -638.0772 | 3.4569 | 3.4225 | | 0.4295 | 4.23 | 3900 | 0.4790 | -0.9893 | -2.0096 | 0.7956 | 1.0203 | -625.0361 | -642.2666 | 3.4290 | 3.3998 | | 0.4162 | 4.34 | 4000 | 0.4769 | -0.9682 | -1.9897 | 0.7956 | 1.0215 | -623.0465 | -640.1562 | 3.4342 | 3.4040 | | 0.425 | 4.45 | 4100 | 0.4759 | -0.9553 | -1.9788 | 0.7917 | 1.0236 | -621.9555 | -638.8621 | 3.4580 | 3.4237 | | 0.4155 | 4.56 | 4200 | 0.4778 | -1.0183 | -2.0573 | 0.7917 | 1.0390 | -629.8077 | -645.1696 | 3.4277 | 3.3981 | | 0.4311 | 4.67 | 4300 | 0.4765 | -0.9712 | -2.0065 | 0.7897 | 1.0353 | -624.7266 | -640.4598 | 3.4413 | 3.4107 | | 0.41 | 4.78 | 4400 | 0.4768 | -0.9764 | -2.0101 | 0.7917 | 1.0337 | -625.0818 | -640.9733 | 3.4387 | 3.4081 | | 0.4127 | 4.89 | 4500 | 0.4749 | -0.9599 | -1.9994 | 0.7937 | 1.0395 | -624.0168 | -639.3277 | 3.4453 | 3.4160 | | 0.453 | 4.99 | 4600 | 0.4748 | -0.9231 | -1.9528 | 0.7917 | 1.0297 | -619.3519 | -635.6462 | 3.4444 | 3.4142 | | 0.4035 | 5.1 | 4700 | 0.4754 | -0.9561 | -1.9965 | 0.7897 | 1.0403 | -623.7211 | -638.9504 | 3.4293 | 3.4019 | | 0.4225 | 5.21 | 4800 | 0.4753 | -0.9471 | -1.9855 | 0.7877 | 1.0384 | -622.6226 | -638.0461 | 3.4359 | 3.4077 | | 0.3941 | 5.32 | 4900 | 0.4754 | -0.9579 | -1.9978 | 0.7897 | 1.0400 | -623.8593 | -639.1230 | 3.4282 | 3.4012 | | 0.4093 | 5.43 | 5000 | 0.4748 | -0.9135 | -1.9448 | 0.7937 | 1.0313 | -618.5530 | -634.6866 | 3.4318 | 3.4052 | | 0.3902 | 5.54 | 5100 | 0.4754 | -0.9457 | -1.9815 | 0.7956 | 1.0358 | -622.2274 | -637.9056 | 3.4281 | 3.4014 | | 0.3795 | 5.65 | 5200 | 0.4753 | -0.9484 | -1.9852 | 0.7897 | 1.0368 | -622.5895 | -638.1724 | 3.4253 | 3.3988 | | 0.3915 | 5.75 | 5300 | 0.4754 | -0.9571 | -1.9957 | 0.7956 | 1.0386 | -623.6450 | -639.0427 | 3.4242 | 3.3979 | | 0.4075 | 5.86 | 5400 | 0.4756 | -0.9566 | -1.9949 | 0.7877 | 1.0383 | -623.5674 | -638.9974 | 3.4221 | 3.3962 | | 0.4293 | 5.97 | 5500 | 0.4756 | -0.9571 | -1.9948 | 0.7897 | 1.0377 | -623.5548 | -639.0446 | 3.4230 | 3.3964 | ### Framework versions - Transformers 4.38.0 - Pytorch 2.1.2+cu118 - Datasets 2.17.1 - Tokenizers 0.15.0