--- license: mit library_name: peft tags: - alignment-handbook - generated_from_trainer - trl - dpo base_model: microsoft/phi-2 datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: phi-2-ipo-renew1 results: [] --- # phi-2-ipo-renew1 This model is a fine-tuned version of [lole25/phi-2-sft-ultrachat-lora](https://huggingface.co/lole25/phi-2-sft-ultrachat-lora) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 2028.0933 - Rewards/chosen: -0.1243 - Rewards/rejected: -0.2158 - Rewards/accuracies: 0.6900 - Rewards/margins: 0.0915 - Logps/rejected: -255.1287 - Logps/chosen: -269.0499 - Logits/rejected: 0.5909 - Logits/chosen: 0.5352 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 2 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 2496.843 | 0.05 | 100 | 2502.2668 | -0.0003 | -0.0002 | 0.5005 | -0.0002 | -233.5649 | -256.6506 | 0.8888 | 0.8318 | | 2499.2807 | 0.1 | 200 | 2494.8354 | 0.0001 | -0.0005 | 0.5190 | 0.0006 | -233.5995 | -256.6106 | 0.8882 | 0.8310 | | 2477.7609 | 0.16 | 300 | 2481.5015 | -0.0011 | -0.0031 | 0.5595 | 0.0019 | -233.8548 | -256.7285 | 0.8892 | 0.8319 | | 2428.4195 | 0.21 | 400 | 2419.1045 | -0.0068 | -0.0156 | 0.6495 | 0.0089 | -235.1127 | -257.2951 | 0.8983 | 0.8404 | | 2296.8842 | 0.26 | 500 | 2349.4358 | -0.0240 | -0.0419 | 0.6565 | 0.0179 | -237.7379 | -259.0124 | 0.8806 | 0.8214 | | 2254.5846 | 0.31 | 600 | 2273.4993 | -0.0525 | -0.0829 | 0.6570 | 0.0304 | -241.8383 | -261.8659 | 0.8478 | 0.7868 | | 2330.7787 | 0.37 | 700 | 2224.3350 | -0.0819 | -0.1221 | 0.6630 | 0.0402 | -245.7631 | -264.8093 | 0.8128 | 0.7517 | | 2223.6863 | 0.42 | 800 | 2196.0991 | -0.1009 | -0.1487 | 0.6675 | 0.0478 | -248.4222 | -266.7057 | 0.7611 | 0.6992 | | 2066.7418 | 0.47 | 900 | 2166.0732 | -0.1112 | -0.1658 | 0.6700 | 0.0546 | -250.1319 | -267.7397 | 0.7518 | 0.6917 | | 2119.2691 | 0.52 | 1000 | 2138.9312 | -0.1215 | -0.1821 | 0.6715 | 0.0606 | -251.7610 | -268.7693 | 0.7213 | 0.6619 | | 2191.7109 | 0.58 | 1100 | 2121.8115 | -0.1257 | -0.1906 | 0.6695 | 0.0648 | -252.6059 | -269.1910 | 0.7176 | 0.6584 | | 2308.1883 | 0.63 | 1200 | 2110.3069 | -0.1409 | -0.2123 | 0.6665 | 0.0715 | -254.7812 | -270.7044 | 0.6920 | 0.6330 | | 1996.7178 | 0.68 | 1300 | 2095.3130 | -0.1314 | -0.2042 | 0.6755 | 0.0728 | -253.9726 | -269.7621 | 0.6722 | 0.6141 | | 2038.3844 | 0.73 | 1400 | 2085.0852 | -0.1383 | -0.2140 | 0.6800 | 0.0756 | -254.9441 | -270.4488 | 0.6513 | 0.5933 | | 2094.2182 | 0.79 | 1500 | 2076.3042 | -0.1390 | -0.2166 | 0.6790 | 0.0777 | -255.2133 | -270.5129 | 0.6474 | 0.5898 | | 2171.3457 | 0.84 | 1600 | 2069.3757 | -0.1374 | -0.2166 | 0.6810 | 0.0792 | -255.2130 | -270.3595 | 0.6392 | 0.5818 | | 2189.3863 | 0.89 | 1700 | 2062.1995 | -0.1386 | -0.2192 | 0.6780 | 0.0806 | -255.4675 | -270.4739 | 0.6291 | 0.5723 | | 2292.8938 | 0.94 | 1800 | 2053.1299 | -0.1196 | -0.2005 | 0.6830 | 0.0809 | -253.6025 | -268.5789 | 0.6275 | 0.5703 | | 2085.5805 | 0.99 | 1900 | 2052.3237 | -0.1086 | -0.1906 | 0.6900 | 0.0821 | -252.6131 | -267.4730 | 0.6319 | 0.5747 | | 1847.759 | 1.05 | 2000 | 2050.4177 | -0.1118 | -0.1953 | 0.6850 | 0.0836 | -253.0827 | -267.7950 | 0.6333 | 0.5763 | | 2024.9559 | 1.1 | 2100 | 2046.7593 | -0.1219 | -0.2083 | 0.6900 | 0.0864 | -254.3799 | -268.8073 | 0.6157 | 0.5590 | | 2038.6354 | 1.15 | 2200 | 2043.5728 | -0.1205 | -0.2072 | 0.6880 | 0.0867 | -254.2731 | -268.6722 | 0.6083 | 0.5518 | | 2022.9617 | 1.2 | 2300 | 2035.5857 | -0.1173 | -0.2041 | 0.6895 | 0.0868 | -253.9597 | -268.3491 | 0.6101 | 0.5535 | | 1871.641 | 1.26 | 2400 | 2036.3373 | -0.1190 | -0.2073 | 0.6895 | 0.0884 | -254.2831 | -268.5161 | 0.6046 | 0.5482 | | 1907.3463 | 1.31 | 2500 | 2034.7010 | -0.1216 | -0.2108 | 0.6880 | 0.0892 | -254.6297 | -268.7765 | 0.6022 | 0.5460 | | 1884.6086 | 1.36 | 2600 | 2033.7977 | -0.1215 | -0.2105 | 0.6910 | 0.0890 | -254.6014 | -268.7708 | 0.6013 | 0.5451 | | 2034.9129 | 1.41 | 2700 | 2032.5447 | -0.1235 | -0.2140 | 0.6900 | 0.0905 | -254.9471 | -268.9633 | 0.5987 | 0.5426 | | 2068.2822 | 1.47 | 2800 | 2030.8698 | -0.1251 | -0.2162 | 0.6900 | 0.0911 | -255.1671 | -269.1270 | 0.5943 | 0.5383 | | 1977.4029 | 1.52 | 2900 | 2030.6033 | -0.1251 | -0.2162 | 0.6895 | 0.0911 | -255.1690 | -269.1252 | 0.5941 | 0.5381 | | 2110.2887 | 1.57 | 3000 | 2030.5707 | -0.1259 | -0.2173 | 0.6905 | 0.0915 | -255.2821 | -269.2050 | 0.5908 | 0.5348 | | 2068.2863 | 1.62 | 3100 | 2029.4174 | -0.1242 | -0.2156 | 0.6935 | 0.0914 | -255.1087 | -269.0390 | 0.5913 | 0.5357 | | 1977.8852 | 1.67 | 3200 | 2026.1289 | -0.1249 | -0.2165 | 0.6960 | 0.0916 | -255.2016 | -269.1071 | 0.5920 | 0.5364 | | 2123.3787 | 1.73 | 3300 | 2027.3552 | -0.1248 | -0.2162 | 0.6930 | 0.0914 | -255.1666 | -269.0933 | 0.5926 | 0.5370 | | 1945.4934 | 1.78 | 3400 | 2025.7804 | -0.1248 | -0.2164 | 0.6935 | 0.0916 | -255.1899 | -269.1010 | 0.5909 | 0.5353 | | 1937.2627 | 1.83 | 3500 | 2027.8240 | -0.1247 | -0.2163 | 0.6930 | 0.0916 | -255.1750 | -269.0878 | 0.5903 | 0.5347 | | 2007.2062 | 1.88 | 3600 | 2025.3228 | -0.1244 | -0.2164 | 0.6895 | 0.0919 | -255.1843 | -269.0623 | 0.5910 | 0.5352 | | 2076.715 | 1.94 | 3700 | 2027.4857 | -0.1243 | -0.2159 | 0.6920 | 0.0916 | -255.1383 | -269.0487 | 0.5913 | 0.5358 | | 2055.2201 | 1.99 | 3800 | 2027.8082 | -0.1244 | -0.2160 | 0.6920 | 0.0916 | -255.1455 | -269.0543 | 0.5902 | 0.5347 | ### Framework versions - PEFT 0.7.1 - Transformers 4.36.2 - Pytorch 2.1.2 - Datasets 2.14.6 - Tokenizers 0.15.2