--- license: apache-2.0 base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - openai/summarize_from_feedback model-index: - name: tinyllama-1.1b-sum-dpo-full_LR3e-8_BS32_3epochs_old results: [] --- # tinyllama-1.1b-sum-dpo-full_LR3e-8_BS32_3epochs_old This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full_old](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full_old) on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set: - Loss: 0.6867 - Rewards/chosen: -0.0478 - Rewards/rejected: -0.0620 - Rewards/accuracies: 0.5936 - Rewards/margins: 0.0142 - Logps/rejected: -69.3779 - Logps/chosen: -63.4876 - Logits/rejected: -3.0580 - Logits/chosen: -3.0637 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-08 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6931 | 0.0345 | 100 | 0.6932 | 0.0001 | 0.0001 | 0.4930 | -0.0000 | -63.1672 | -58.7024 | -3.1577 | -3.1633 | | 0.6931 | 0.0689 | 200 | 0.6932 | 0.0001 | 0.0001 | 0.4888 | -0.0001 | -63.1661 | -58.7066 | -3.1577 | -3.1634 | | 0.6931 | 0.1034 | 300 | 0.6932 | 0.0000 | 0.0001 | 0.4933 | -0.0001 | -63.1693 | -58.7071 | -3.1578 | -3.1634 | | 0.6931 | 0.1378 | 400 | 0.6932 | 0.0001 | 0.0001 | 0.4809 | -0.0000 | -63.1727 | -58.7061 | -3.1575 | -3.1632 | | 0.6931 | 0.1723 | 500 | 0.6931 | 0.0002 | 0.0002 | 0.5098 | 0.0000 | -63.1633 | -58.6928 | -3.1577 | -3.1634 | | 0.6931 | 0.2068 | 600 | 0.6932 | 0.0002 | 0.0002 | 0.4937 | -0.0000 | -63.1596 | -58.6920 | -3.1574 | -3.1630 | | 0.6929 | 0.2412 | 700 | 0.6931 | 0.0003 | 0.0002 | 0.4905 | 0.0001 | -63.1582 | -58.6817 | -3.1572 | -3.1629 | | 0.6929 | 0.2757 | 800 | 0.6931 | 0.0004 | 0.0003 | 0.5237 | 0.0001 | -63.1485 | -58.6703 | -3.1566 | -3.1622 | | 0.6927 | 0.3101 | 900 | 0.6931 | 0.0006 | 0.0004 | 0.5186 | 0.0001 | -63.1378 | -58.6559 | -3.1564 | -3.1620 | | 0.6925 | 0.3446 | 1000 | 0.6930 | 0.0008 | 0.0004 | 0.5279 | 0.0003 | -63.1375 | -58.6361 | -3.1554 | -3.1610 | | 0.6924 | 0.3790 | 1100 | 0.6930 | 0.0009 | 0.0005 | 0.5560 | 0.0004 | -63.1285 | -58.6220 | -3.1548 | -3.1604 | | 0.692 | 0.4135 | 1200 | 0.6929 | 0.0011 | 0.0006 | 0.5407 | 0.0005 | -63.1206 | -58.5973 | -3.1539 | -3.1595 | | 0.6914 | 0.4480 | 1300 | 0.6928 | 0.0013 | 0.0007 | 0.5383 | 0.0006 | -63.1120 | -58.5819 | -3.1528 | -3.1584 | | 0.6917 | 0.4824 | 1400 | 0.6927 | 0.0016 | 0.0006 | 0.5648 | 0.0009 | -63.1160 | -58.5533 | -3.1518 | -3.1574 | | 0.6914 | 0.5169 | 1500 | 0.6926 | 0.0016 | 0.0006 | 0.5574 | 0.0010 | -63.1243 | -58.5539 | -3.1505 | -3.1561 | | 0.6916 | 0.5513 | 1600 | 0.6926 | 0.0018 | 0.0007 | 0.5576 | 0.0012 | -63.1145 | -58.5288 | -3.1493 | -3.1549 | | 0.6906 | 0.5858 | 1700 | 0.6925 | 0.0019 | 0.0004 | 0.5625 | 0.0014 | -63.1358 | -58.5250 | -3.1471 | -3.1527 | | 0.6908 | 0.6203 | 1800 | 0.6923 | 0.0019 | 0.0002 | 0.5551 | 0.0017 | -63.1602 | -58.5198 | -3.1456 | -3.1513 | | 0.6903 | 0.6547 | 1900 | 0.6922 | 0.0019 | -0.0001 | 0.5720 | 0.0020 | -63.1895 | -58.5253 | -3.1437 | -3.1493 | | 0.6895 | 0.6892 | 2000 | 0.6920 | 0.0016 | -0.0007 | 0.5795 | 0.0023 | -63.2502 | -58.5471 | -3.1418 | -3.1475 | | 0.6891 | 0.7236 | 2100 | 0.6919 | 0.0017 | -0.0009 | 0.5818 | 0.0026 | -63.2700 | -58.5423 | -3.1394 | -3.1450 | | 0.6906 | 0.7581 | 2200 | 0.6918 | 0.0013 | -0.0016 | 0.5737 | 0.0028 | -63.3380 | -58.5865 | -3.1376 | -3.1432 | | 0.6893 | 0.7926 | 2300 | 0.6917 | 0.0011 | -0.0020 | 0.5730 | 0.0031 | -63.3761 | -58.6009 | -3.1358 | -3.1414 | | 0.6899 | 0.8270 | 2400 | 0.6915 | 0.0006 | -0.0028 | 0.5764 | 0.0034 | -63.4591 | -58.6538 | -3.1338 | -3.1394 | | 0.6894 | 0.8615 | 2500 | 0.6914 | 0.0002 | -0.0034 | 0.5743 | 0.0036 | -63.5245 | -58.6934 | -3.1315 | -3.1372 | | 0.6883 | 0.8959 | 2600 | 0.6912 | -0.0003 | -0.0043 | 0.5764 | 0.0040 | -63.6123 | -58.7457 | -3.1297 | -3.1354 | | 0.6875 | 0.9304 | 2700 | 0.6911 | -0.0010 | -0.0053 | 0.5781 | 0.0043 | -63.7097 | -58.8142 | -3.1282 | -3.1338 | | 0.6871 | 0.9649 | 2800 | 0.6910 | -0.0016 | -0.0061 | 0.5760 | 0.0045 | -63.7868 | -58.8701 | -3.1261 | -3.1317 | | 0.6871 | 0.9993 | 2900 | 0.6909 | -0.0024 | -0.0072 | 0.5762 | 0.0048 | -63.8972 | -58.9496 | -3.1231 | -3.1287 | | 0.6874 | 1.0338 | 3000 | 0.6907 | -0.0032 | -0.0084 | 0.5834 | 0.0051 | -64.0164 | -59.0348 | -3.1212 | -3.1268 | | 0.6859 | 1.0682 | 3100 | 0.6906 | -0.0042 | -0.0096 | 0.5806 | 0.0054 | -64.1398 | -59.1344 | -3.1190 | -3.1247 | | 0.6842 | 1.1027 | 3200 | 0.6904 | -0.0051 | -0.0109 | 0.5839 | 0.0058 | -64.2725 | -59.2256 | -3.1161 | -3.1218 | | 0.6884 | 1.1371 | 3300 | 0.6903 | -0.0066 | -0.0127 | 0.5874 | 0.0061 | -64.4506 | -59.3731 | -3.1139 | -3.1196 | | 0.6858 | 1.1716 | 3400 | 0.6902 | -0.0080 | -0.0142 | 0.5785 | 0.0062 | -64.5965 | -59.5071 | -3.1116 | -3.1173 | | 0.6859 | 1.2061 | 3500 | 0.6900 | -0.0099 | -0.0166 | 0.5832 | 0.0066 | -64.8362 | -59.7041 | -3.1101 | -3.1158 | | 0.685 | 1.2405 | 3600 | 0.6899 | -0.0115 | -0.0185 | 0.5783 | 0.0069 | -65.0265 | -59.8637 | -3.1069 | -3.1126 | | 0.6839 | 1.2750 | 3700 | 0.6898 | -0.0129 | -0.0202 | 0.5820 | 0.0072 | -65.1978 | -60.0064 | -3.1049 | -3.1106 | | 0.6824 | 1.3094 | 3800 | 0.6896 | -0.0145 | -0.0220 | 0.5832 | 0.0076 | -65.3850 | -60.1580 | -3.1023 | -3.1080 | | 0.6847 | 1.3439 | 3900 | 0.6895 | -0.0161 | -0.0240 | 0.5834 | 0.0078 | -65.5760 | -60.3265 | -3.1007 | -3.1064 | | 0.6865 | 1.3784 | 4000 | 0.6894 | -0.0179 | -0.0261 | 0.5876 | 0.0081 | -65.7873 | -60.5061 | -3.0990 | -3.1047 | | 0.6826 | 1.4128 | 4100 | 0.6892 | -0.0197 | -0.0282 | 0.5899 | 0.0085 | -65.9972 | -60.6782 | -3.0968 | -3.1025 | | 0.6801 | 1.4473 | 4200 | 0.6890 | -0.0209 | -0.0299 | 0.5922 | 0.0090 | -66.1658 | -60.8002 | -3.0952 | -3.1009 | | 0.6814 | 1.4817 | 4300 | 0.6890 | -0.0227 | -0.0318 | 0.5878 | 0.0091 | -66.3577 | -60.9789 | -3.0926 | -3.0983 | | 0.683 | 1.5162 | 4400 | 0.6888 | -0.0239 | -0.0334 | 0.5913 | 0.0094 | -66.5158 | -61.1062 | -3.0910 | -3.0967 | | 0.679 | 1.5507 | 4500 | 0.6887 | -0.0255 | -0.0352 | 0.5948 | 0.0097 | -66.7038 | -61.2636 | -3.0892 | -3.0949 | | 0.6834 | 1.5851 | 4600 | 0.6886 | -0.0275 | -0.0375 | 0.5934 | 0.0100 | -66.9283 | -61.4618 | -3.0871 | -3.0928 | | 0.685 | 1.6196 | 4700 | 0.6884 | -0.0284 | -0.0387 | 0.5929 | 0.0103 | -67.0469 | -61.5498 | -3.0853 | -3.0910 | | 0.683 | 1.6540 | 4800 | 0.6883 | -0.0294 | -0.0400 | 0.5960 | 0.0106 | -67.1815 | -61.6491 | -3.0831 | -3.0889 | | 0.6781 | 1.6885 | 4900 | 0.6882 | -0.0307 | -0.0416 | 0.5950 | 0.0109 | -67.3424 | -61.7858 | -3.0820 | -3.0877 | | 0.6813 | 1.7229 | 5000 | 0.6881 | -0.0317 | -0.0426 | 0.5943 | 0.0110 | -67.4448 | -61.8785 | -3.0805 | -3.0863 | | 0.6823 | 1.7574 | 5100 | 0.6880 | -0.0328 | -0.0440 | 0.5950 | 0.0112 | -67.5799 | -61.9921 | -3.0789 | -3.0846 | | 0.6798 | 1.7919 | 5200 | 0.6879 | -0.0341 | -0.0457 | 0.5987 | 0.0116 | -67.7483 | -62.1205 | -3.0772 | -3.0829 | | 0.6798 | 1.8263 | 5300 | 0.6877 | -0.0353 | -0.0472 | 0.5953 | 0.0119 | -67.8958 | -62.2422 | -3.0757 | -3.0814 | | 0.6784 | 1.8608 | 5400 | 0.6876 | -0.0368 | -0.0489 | 0.5969 | 0.0122 | -68.0724 | -62.3875 | -3.0742 | -3.0798 | | 0.6853 | 1.8952 | 5500 | 0.6876 | -0.0377 | -0.0500 | 0.5946 | 0.0123 | -68.1765 | -62.4820 | -3.0735 | -3.0792 | | 0.6769 | 1.9297 | 5600 | 0.6875 | -0.0392 | -0.0517 | 0.5941 | 0.0125 | -68.3471 | -62.6278 | -3.0713 | -3.0771 | | 0.6788 | 1.9642 | 5700 | 0.6874 | -0.0399 | -0.0526 | 0.5941 | 0.0127 | -68.4439 | -62.7029 | -3.0701 | -3.0759 | | 0.6798 | 1.9986 | 5800 | 0.6873 | -0.0410 | -0.0538 | 0.5925 | 0.0128 | -68.5632 | -62.8140 | -3.0694 | -3.0752 | | 0.683 | 2.0331 | 5900 | 0.6872 | -0.0418 | -0.0549 | 0.5934 | 0.0131 | -68.6699 | -62.8917 | -3.0677 | -3.0735 | | 0.6766 | 2.0675 | 6000 | 0.6872 | -0.0425 | -0.0555 | 0.5918 | 0.0130 | -68.7314 | -62.9600 | -3.0675 | -3.0732 | | 0.6756 | 2.1020 | 6100 | 0.6871 | -0.0428 | -0.0561 | 0.5922 | 0.0133 | -68.7950 | -62.9959 | -3.0660 | -3.0717 | | 0.6805 | 2.1365 | 6200 | 0.6871 | -0.0435 | -0.0568 | 0.5904 | 0.0133 | -68.8622 | -63.0611 | -3.0654 | -3.0711 | | 0.6797 | 2.1709 | 6300 | 0.6871 | -0.0443 | -0.0577 | 0.5929 | 0.0134 | -68.9493 | -63.1378 | -3.0645 | -3.0703 | | 0.6802 | 2.2054 | 6400 | 0.6870 | -0.0442 | -0.0577 | 0.5913 | 0.0135 | -68.9530 | -63.1312 | -3.0641 | -3.0698 | | 0.6802 | 2.2398 | 6500 | 0.6870 | -0.0445 | -0.0581 | 0.5934 | 0.0136 | -68.9891 | -63.1579 | -3.0633 | -3.0690 | | 0.6806 | 2.2743 | 6600 | 0.6870 | -0.0448 | -0.0585 | 0.5925 | 0.0136 | -69.0289 | -63.1964 | -3.0624 | -3.0682 | | 0.6755 | 2.3088 | 6700 | 0.6869 | -0.0453 | -0.0590 | 0.5918 | 0.0137 | -69.0814 | -63.2383 | -3.0618 | -3.0675 | | 0.6826 | 2.3432 | 6800 | 0.6869 | -0.0455 | -0.0593 | 0.5962 | 0.0138 | -69.1095 | -63.2637 | -3.0612 | -3.0669 | | 0.6786 | 2.3777 | 6900 | 0.6869 | -0.0459 | -0.0598 | 0.5892 | 0.0139 | -69.1580 | -63.3046 | -3.0607 | -3.0664 | | 0.6798 | 2.4121 | 7000 | 0.6868 | -0.0463 | -0.0602 | 0.5934 | 0.0139 | -69.2011 | -63.3391 | -3.0601 | -3.0658 | | 0.6762 | 2.4466 | 7100 | 0.6868 | -0.0466 | -0.0606 | 0.5936 | 0.0140 | -69.2414 | -63.3699 | -3.0598 | -3.0656 | | 0.6782 | 2.4810 | 7200 | 0.6868 | -0.0470 | -0.0611 | 0.5918 | 0.0141 | -69.2927 | -63.4167 | -3.0595 | -3.0652 | | 0.6821 | 2.5155 | 7300 | 0.6868 | -0.0472 | -0.0612 | 0.5943 | 0.0140 | -69.3050 | -63.4345 | -3.0589 | -3.0647 | | 0.6806 | 2.5500 | 7400 | 0.6868 | -0.0473 | -0.0614 | 0.5908 | 0.0141 | -69.3214 | -63.4432 | -3.0588 | -3.0646 | | 0.6824 | 2.5844 | 7500 | 0.6867 | -0.0475 | -0.0616 | 0.5918 | 0.0142 | -69.3426 | -63.4585 | -3.0589 | -3.0647 | | 0.6789 | 2.6189 | 7600 | 0.6868 | -0.0477 | -0.0618 | 0.5915 | 0.0141 | -69.3578 | -63.4788 | -3.0584 | -3.0642 | | 0.6768 | 2.6533 | 7700 | 0.6867 | -0.0475 | -0.0618 | 0.5946 | 0.0144 | -69.3650 | -63.4617 | -3.0582 | -3.0640 | | 0.6808 | 2.6878 | 7800 | 0.6867 | -0.0477 | -0.0619 | 0.5918 | 0.0142 | -69.3712 | -63.4863 | -3.0584 | -3.0642 | | 0.6782 | 2.7223 | 7900 | 0.6867 | -0.0478 | -0.0621 | 0.5925 | 0.0143 | -69.3874 | -63.4902 | -3.0581 | -3.0639 | | 0.6794 | 2.7567 | 8000 | 0.6867 | -0.0479 | -0.0621 | 0.5897 | 0.0142 | -69.3922 | -63.5035 | -3.0580 | -3.0638 | | 0.674 | 2.7912 | 8100 | 0.6867 | -0.0479 | -0.0621 | 0.5911 | 0.0142 | -69.3883 | -63.4992 | -3.0580 | -3.0638 | | 0.6766 | 2.8256 | 8200 | 0.6866 | -0.0478 | -0.0622 | 0.5899 | 0.0144 | -69.4003 | -63.4938 | -3.0581 | -3.0639 | | 0.6821 | 2.8601 | 8300 | 0.6867 | -0.0479 | -0.0622 | 0.5890 | 0.0143 | -69.3970 | -63.4998 | -3.0579 | -3.0637 | | 0.6795 | 2.8946 | 8400 | 0.6867 | -0.0478 | -0.0621 | 0.5904 | 0.0142 | -69.3868 | -63.4954 | -3.0580 | -3.0637 | | 0.679 | 2.9290 | 8500 | 0.6867 | -0.0479 | -0.0622 | 0.5925 | 0.0143 | -69.3981 | -63.4995 | -3.0579 | -3.0637 | | 0.6816 | 2.9635 | 8600 | 0.6867 | -0.0478 | -0.0621 | 0.5922 | 0.0144 | -69.3946 | -63.4907 | -3.0579 | -3.0637 | | 0.6751 | 2.9979 | 8700 | 0.6867 | -0.0478 | -0.0620 | 0.5936 | 0.0142 | -69.3779 | -63.4876 | -3.0580 | -3.0637 | ### Framework versions - Transformers 4.41.2 - Pytorch 2.1.2 - Datasets 2.19.2 - Tokenizers 0.19.1