--- license: apache-2.0 base_model: martimfasantos/tinyllama-1.1b-sum-sft-full tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - openai/summarize_from_feedback model-index: - name: tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs results: [] --- # tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full) on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set: - Loss: 0.6501 - Rewards/chosen: -1.0591 - Rewards/rejected: -1.2329 - Rewards/accuracies: 0.6032 - Rewards/margins: 0.1739 - Logps/rejected: -186.0431 - Logps/chosen: -164.9210 - Logits/rejected: -2.3430 - Logits/chosen: -2.3551 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.693 | 0.0689 | 400 | 0.6931 | 0.0003 | 0.0002 | 0.5112 | 0.0001 | -62.7270 | -58.9858 | -2.9691 | -2.9727 | | 0.6923 | 0.1378 | 800 | 0.6926 | 0.0024 | 0.0012 | 0.5493 | 0.0011 | -62.6258 | -58.7797 | -2.9667 | -2.9701 | | 0.6901 | 0.2068 | 1200 | 0.6907 | -0.0080 | -0.0133 | 0.5697 | 0.0053 | -64.0827 | -59.8146 | -2.9579 | -2.9613 | | 0.6835 | 0.2757 | 1600 | 0.6880 | -0.0321 | -0.0436 | 0.5764 | 0.0114 | -67.1050 | -62.2266 | -2.9410 | -2.9442 | | 0.6865 | 0.3446 | 2000 | 0.6852 | -0.0690 | -0.0874 | 0.5713 | 0.0184 | -71.4878 | -65.9158 | -2.9158 | -2.9192 | | 0.6767 | 0.4135 | 2400 | 0.6817 | -0.1086 | -0.1352 | 0.5816 | 0.0265 | -76.2651 | -69.8803 | -2.8906 | -2.8938 | | 0.6726 | 0.4824 | 2800 | 0.6792 | -0.1614 | -0.1943 | 0.5767 | 0.0328 | -82.1753 | -75.1597 | -2.8617 | -2.8651 | | 0.6643 | 0.5513 | 3200 | 0.6729 | -0.2581 | -0.3074 | 0.5948 | 0.0493 | -93.4915 | -84.8225 | -2.8387 | -2.8420 | | 0.6614 | 0.6203 | 3600 | 0.6740 | -0.2589 | -0.3059 | 0.5904 | 0.0470 | -93.3416 | -84.9094 | -2.8113 | -2.8144 | | 0.6609 | 0.6892 | 4000 | 0.6696 | -0.3009 | -0.3603 | 0.6053 | 0.0594 | -98.7785 | -89.1073 | -2.7879 | -2.7912 | | 0.6562 | 0.7581 | 4400 | 0.6667 | -0.4072 | -0.4790 | 0.5983 | 0.0718 | -110.6499 | -99.7330 | -2.7515 | -2.7548 | | 0.6569 | 0.8270 | 4800 | 0.6637 | -0.4951 | -0.5782 | 0.6059 | 0.0831 | -120.5742 | -108.5273 | -2.7283 | -2.7316 | | 0.6383 | 0.8959 | 5200 | 0.6621 | -0.5180 | -0.6112 | 0.6055 | 0.0932 | -123.8654 | -110.8119 | -2.7112 | -2.7149 | | 0.6411 | 0.9649 | 5600 | 0.6623 | -0.5228 | -0.6134 | 0.6055 | 0.0906 | -124.0929 | -111.2965 | -2.6869 | -2.6910 | | 0.6293 | 1.0338 | 6000 | 0.6618 | -0.6210 | -0.7260 | 0.6064 | 0.1049 | -135.3463 | -121.1192 | -2.6526 | -2.6573 | | 0.6247 | 1.1027 | 6400 | 0.6587 | -0.7088 | -0.8268 | 0.5990 | 0.1180 | -145.4310 | -129.8984 | -2.6201 | -2.6254 | | 0.6194 | 1.1716 | 6800 | 0.6580 | -0.7955 | -0.9191 | 0.5980 | 0.1236 | -154.6599 | -138.5692 | -2.5858 | -2.5912 | | 0.6127 | 1.2405 | 7200 | 0.6558 | -0.6612 | -0.7815 | 0.6039 | 0.1203 | -140.8955 | -125.1357 | -2.5822 | -2.5877 | | 0.6531 | 1.3094 | 7600 | 0.6534 | -0.7460 | -0.8804 | 0.6041 | 0.1344 | -150.7862 | -133.6133 | -2.5502 | -2.5564 | | 0.5995 | 1.3784 | 8000 | 0.6528 | -0.8128 | -0.9555 | 0.6006 | 0.1427 | -158.2948 | -140.2942 | -2.5195 | -2.5267 | | 0.61 | 1.4473 | 8400 | 0.6540 | -0.7310 | -0.8603 | 0.5980 | 0.1293 | -148.7821 | -132.1185 | -2.5198 | -2.5268 | | 0.6575 | 1.5162 | 8800 | 0.6527 | -0.8369 | -0.9764 | 0.5997 | 0.1395 | -160.3900 | -142.7025 | -2.4947 | -2.5022 | | 0.5969 | 1.5851 | 9200 | 0.6516 | -0.8922 | -1.0366 | 0.6101 | 0.1444 | -166.4089 | -148.2315 | -2.4661 | -2.4746 | | 0.6211 | 1.6540 | 9600 | 0.6526 | -0.7875 | -0.9248 | 0.6094 | 0.1373 | -155.2340 | -137.7698 | -2.4725 | -2.4804 | | 0.6011 | 1.7229 | 10000 | 0.6517 | -0.8912 | -1.0379 | 0.6099 | 0.1467 | -166.5410 | -148.1359 | -2.4396 | -2.4489 | | 0.571 | 1.7919 | 10400 | 0.6514 | -0.8234 | -0.9653 | 0.6122 | 0.1419 | -159.2782 | -141.3557 | -2.4401 | -2.4489 | | 0.5889 | 1.8608 | 10800 | 0.6506 | -1.0172 | -1.1751 | 0.6055 | 0.1579 | -180.2568 | -160.7332 | -2.3932 | -2.4039 | | 0.5685 | 1.9297 | 11200 | 0.6486 | -1.0256 | -1.1907 | 0.5992 | 0.1651 | -181.8200 | -161.5783 | -2.3887 | -2.3992 | | 0.63 | 1.9986 | 11600 | 0.6502 | -0.8869 | -1.0380 | 0.6004 | 0.1511 | -166.5461 | -147.7054 | -2.4012 | -2.4108 | | 0.5891 | 2.0675 | 12000 | 0.6490 | -1.0453 | -1.2122 | 0.6046 | 0.1670 | -183.9714 | -163.5418 | -2.3713 | -2.3825 | | 0.5808 | 2.1365 | 12400 | 0.6490 | -1.1906 | -1.3718 | 0.6039 | 0.1811 | -199.9255 | -178.0778 | -2.3382 | -2.3508 | | 0.6051 | 2.2054 | 12800 | 0.6496 | -1.0959 | -1.2648 | 0.6053 | 0.1689 | -189.2301 | -168.6040 | -2.3542 | -2.3658 | | 0.6223 | 2.2743 | 13200 | 0.6502 | -1.0865 | -1.2588 | 0.6069 | 0.1723 | -188.6267 | -167.6660 | -2.3460 | -2.3579 | | 0.6245 | 2.3432 | 13600 | 0.6506 | -1.0806 | -1.2530 | 0.5983 | 0.1724 | -188.0497 | -167.0715 | -2.3462 | -2.3583 | | 0.5716 | 2.4121 | 14000 | 0.6511 | -1.0306 | -1.1979 | 0.5941 | 0.1672 | -182.5368 | -162.0786 | -2.3533 | -2.3651 | | 0.6078 | 2.4810 | 14400 | 0.6506 | -1.0889 | -1.2642 | 0.6004 | 0.1753 | -189.1684 | -167.9059 | -2.3417 | -2.3540 | | 0.6112 | 2.5500 | 14800 | 0.6500 | -1.1067 | -1.2865 | 0.5971 | 0.1798 | -191.4036 | -169.6898 | -2.3390 | -2.3514 | | 0.5773 | 2.6189 | 15200 | 0.6508 | -1.0435 | -1.2146 | 0.6025 | 0.1712 | -184.2123 | -163.3605 | -2.3468 | -2.3588 | | 0.5983 | 2.6878 | 15600 | 0.6505 | -1.0660 | -1.2397 | 0.6018 | 0.1737 | -186.7185 | -165.6157 | -2.3419 | -2.3540 | | 0.5983 | 2.7567 | 16000 | 0.6501 | -1.0707 | -1.2465 | 0.6029 | 0.1758 | -187.3989 | -166.0839 | -2.3408 | -2.3530 | | 0.5956 | 2.8256 | 16400 | 0.6500 | -1.0594 | -1.2333 | 0.6008 | 0.1739 | -186.0803 | -164.9520 | -2.3429 | -2.3550 | | 0.6221 | 2.8946 | 16800 | 0.6499 | -1.0592 | -1.2333 | 0.6041 | 0.1742 | -186.0846 | -164.9336 | -2.3430 | -2.3551 | | 0.6096 | 2.9635 | 17200 | 0.6500 | -1.0595 | -1.2334 | 0.6046 | 0.1739 | -186.0905 | -164.9614 | -2.3429 | -2.3549 | ### Framework versions - Transformers 4.41.2 - Pytorch 2.1.2 - Datasets 2.19.2 - Tokenizers 0.19.1