Edit model card

tinyllama-1.1b-sum-dpo-full_LR3e-8_BS32_3epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6867
  • Rewards/chosen: -0.0478
  • Rewards/rejected: -0.0620
  • Rewards/accuracies: 0.5936
  • Rewards/margins: 0.0142
  • Logps/rejected: -69.3779
  • Logps/chosen: -63.4876
  • Logits/rejected: -3.0580
  • Logits/chosen: -3.0637

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.0345 100 0.6932 0.0001 0.0001 0.4930 -0.0000 -63.1672 -58.7024 -3.1577 -3.1633
0.6931 0.0689 200 0.6932 0.0001 0.0001 0.4888 -0.0001 -63.1661 -58.7066 -3.1577 -3.1634
0.6931 0.1034 300 0.6932 0.0000 0.0001 0.4933 -0.0001 -63.1693 -58.7071 -3.1578 -3.1634
0.6931 0.1378 400 0.6932 0.0001 0.0001 0.4809 -0.0000 -63.1727 -58.7061 -3.1575 -3.1632
0.6931 0.1723 500 0.6931 0.0002 0.0002 0.5098 0.0000 -63.1633 -58.6928 -3.1577 -3.1634
0.6931 0.2068 600 0.6932 0.0002 0.0002 0.4937 -0.0000 -63.1596 -58.6920 -3.1574 -3.1630
0.6929 0.2412 700 0.6931 0.0003 0.0002 0.4905 0.0001 -63.1582 -58.6817 -3.1572 -3.1629
0.6929 0.2757 800 0.6931 0.0004 0.0003 0.5237 0.0001 -63.1485 -58.6703 -3.1566 -3.1622
0.6927 0.3101 900 0.6931 0.0006 0.0004 0.5186 0.0001 -63.1378 -58.6559 -3.1564 -3.1620
0.6925 0.3446 1000 0.6930 0.0008 0.0004 0.5279 0.0003 -63.1375 -58.6361 -3.1554 -3.1610
0.6924 0.3790 1100 0.6930 0.0009 0.0005 0.5560 0.0004 -63.1285 -58.6220 -3.1548 -3.1604
0.692 0.4135 1200 0.6929 0.0011 0.0006 0.5407 0.0005 -63.1206 -58.5973 -3.1539 -3.1595
0.6914 0.4480 1300 0.6928 0.0013 0.0007 0.5383 0.0006 -63.1120 -58.5819 -3.1528 -3.1584
0.6917 0.4824 1400 0.6927 0.0016 0.0006 0.5648 0.0009 -63.1160 -58.5533 -3.1518 -3.1574
0.6914 0.5169 1500 0.6926 0.0016 0.0006 0.5574 0.0010 -63.1243 -58.5539 -3.1505 -3.1561
0.6916 0.5513 1600 0.6926 0.0018 0.0007 0.5576 0.0012 -63.1145 -58.5288 -3.1493 -3.1549
0.6906 0.5858 1700 0.6925 0.0019 0.0004 0.5625 0.0014 -63.1358 -58.5250 -3.1471 -3.1527
0.6908 0.6203 1800 0.6923 0.0019 0.0002 0.5551 0.0017 -63.1602 -58.5198 -3.1456 -3.1513
0.6903 0.6547 1900 0.6922 0.0019 -0.0001 0.5720 0.0020 -63.1895 -58.5253 -3.1437 -3.1493
0.6895 0.6892 2000 0.6920 0.0016 -0.0007 0.5795 0.0023 -63.2502 -58.5471 -3.1418 -3.1475
0.6891 0.7236 2100 0.6919 0.0017 -0.0009 0.5818 0.0026 -63.2700 -58.5423 -3.1394 -3.1450
0.6906 0.7581 2200 0.6918 0.0013 -0.0016 0.5737 0.0028 -63.3380 -58.5865 -3.1376 -3.1432
0.6893 0.7926 2300 0.6917 0.0011 -0.0020 0.5730 0.0031 -63.3761 -58.6009 -3.1358 -3.1414
0.6899 0.8270 2400 0.6915 0.0006 -0.0028 0.5764 0.0034 -63.4591 -58.6538 -3.1338 -3.1394
0.6894 0.8615 2500 0.6914 0.0002 -0.0034 0.5743 0.0036 -63.5245 -58.6934 -3.1315 -3.1372
0.6883 0.8959 2600 0.6912 -0.0003 -0.0043 0.5764 0.0040 -63.6123 -58.7457 -3.1297 -3.1354
0.6875 0.9304 2700 0.6911 -0.0010 -0.0053 0.5781 0.0043 -63.7097 -58.8142 -3.1282 -3.1338
0.6871 0.9649 2800 0.6910 -0.0016 -0.0061 0.5760 0.0045 -63.7868 -58.8701 -3.1261 -3.1317
0.6871 0.9993 2900 0.6909 -0.0024 -0.0072 0.5762 0.0048 -63.8972 -58.9496 -3.1231 -3.1287
0.6874 1.0338 3000 0.6907 -0.0032 -0.0084 0.5834 0.0051 -64.0164 -59.0348 -3.1212 -3.1268
0.6859 1.0682 3100 0.6906 -0.0042 -0.0096 0.5806 0.0054 -64.1398 -59.1344 -3.1190 -3.1247
0.6842 1.1027 3200 0.6904 -0.0051 -0.0109 0.5839 0.0058 -64.2725 -59.2256 -3.1161 -3.1218
0.6884 1.1371 3300 0.6903 -0.0066 -0.0127 0.5874 0.0061 -64.4506 -59.3731 -3.1139 -3.1196
0.6858 1.1716 3400 0.6902 -0.0080 -0.0142 0.5785 0.0062 -64.5965 -59.5071 -3.1116 -3.1173
0.6859 1.2061 3500 0.6900 -0.0099 -0.0166 0.5832 0.0066 -64.8362 -59.7041 -3.1101 -3.1158
0.685 1.2405 3600 0.6899 -0.0115 -0.0185 0.5783 0.0069 -65.0265 -59.8637 -3.1069 -3.1126
0.6839 1.2750 3700 0.6898 -0.0129 -0.0202 0.5820 0.0072 -65.1978 -60.0064 -3.1049 -3.1106
0.6824 1.3094 3800 0.6896 -0.0145 -0.0220 0.5832 0.0076 -65.3850 -60.1580 -3.1023 -3.1080
0.6847 1.3439 3900 0.6895 -0.0161 -0.0240 0.5834 0.0078 -65.5760 -60.3265 -3.1007 -3.1064
0.6865 1.3784 4000 0.6894 -0.0179 -0.0261 0.5876 0.0081 -65.7873 -60.5061 -3.0990 -3.1047
0.6826 1.4128 4100 0.6892 -0.0197 -0.0282 0.5899 0.0085 -65.9972 -60.6782 -3.0968 -3.1025
0.6801 1.4473 4200 0.6890 -0.0209 -0.0299 0.5922 0.0090 -66.1658 -60.8002 -3.0952 -3.1009
0.6814 1.4817 4300 0.6890 -0.0227 -0.0318 0.5878 0.0091 -66.3577 -60.9789 -3.0926 -3.0983
0.683 1.5162 4400 0.6888 -0.0239 -0.0334 0.5913 0.0094 -66.5158 -61.1062 -3.0910 -3.0967
0.679 1.5507 4500 0.6887 -0.0255 -0.0352 0.5948 0.0097 -66.7038 -61.2636 -3.0892 -3.0949
0.6834 1.5851 4600 0.6886 -0.0275 -0.0375 0.5934 0.0100 -66.9283 -61.4618 -3.0871 -3.0928
0.685 1.6196 4700 0.6884 -0.0284 -0.0387 0.5929 0.0103 -67.0469 -61.5498 -3.0853 -3.0910
0.683 1.6540 4800 0.6883 -0.0294 -0.0400 0.5960 0.0106 -67.1815 -61.6491 -3.0831 -3.0889
0.6781 1.6885 4900 0.6882 -0.0307 -0.0416 0.5950 0.0109 -67.3424 -61.7858 -3.0820 -3.0877
0.6813 1.7229 5000 0.6881 -0.0317 -0.0426 0.5943 0.0110 -67.4448 -61.8785 -3.0805 -3.0863
0.6823 1.7574 5100 0.6880 -0.0328 -0.0440 0.5950 0.0112 -67.5799 -61.9921 -3.0789 -3.0846
0.6798 1.7919 5200 0.6879 -0.0341 -0.0457 0.5987 0.0116 -67.7483 -62.1205 -3.0772 -3.0829
0.6798 1.8263 5300 0.6877 -0.0353 -0.0472 0.5953 0.0119 -67.8958 -62.2422 -3.0757 -3.0814
0.6784 1.8608 5400 0.6876 -0.0368 -0.0489 0.5969 0.0122 -68.0724 -62.3875 -3.0742 -3.0798
0.6853 1.8952 5500 0.6876 -0.0377 -0.0500 0.5946 0.0123 -68.1765 -62.4820 -3.0735 -3.0792
0.6769 1.9297 5600 0.6875 -0.0392 -0.0517 0.5941 0.0125 -68.3471 -62.6278 -3.0713 -3.0771
0.6788 1.9642 5700 0.6874 -0.0399 -0.0526 0.5941 0.0127 -68.4439 -62.7029 -3.0701 -3.0759
0.6798 1.9986 5800 0.6873 -0.0410 -0.0538 0.5925 0.0128 -68.5632 -62.8140 -3.0694 -3.0752
0.683 2.0331 5900 0.6872 -0.0418 -0.0549 0.5934 0.0131 -68.6699 -62.8917 -3.0677 -3.0735
0.6766 2.0675 6000 0.6872 -0.0425 -0.0555 0.5918 0.0130 -68.7314 -62.9600 -3.0675 -3.0732
0.6756 2.1020 6100 0.6871 -0.0428 -0.0561 0.5922 0.0133 -68.7950 -62.9959 -3.0660 -3.0717
0.6805 2.1365 6200 0.6871 -0.0435 -0.0568 0.5904 0.0133 -68.8622 -63.0611 -3.0654 -3.0711
0.6797 2.1709 6300 0.6871 -0.0443 -0.0577 0.5929 0.0134 -68.9493 -63.1378 -3.0645 -3.0703
0.6802 2.2054 6400 0.6870 -0.0442 -0.0577 0.5913 0.0135 -68.9530 -63.1312 -3.0641 -3.0698
0.6802 2.2398 6500 0.6870 -0.0445 -0.0581 0.5934 0.0136 -68.9891 -63.1579 -3.0633 -3.0690
0.6806 2.2743 6600 0.6870 -0.0448 -0.0585 0.5925 0.0136 -69.0289 -63.1964 -3.0624 -3.0682
0.6755 2.3088 6700 0.6869 -0.0453 -0.0590 0.5918 0.0137 -69.0814 -63.2383 -3.0618 -3.0675
0.6826 2.3432 6800 0.6869 -0.0455 -0.0593 0.5962 0.0138 -69.1095 -63.2637 -3.0612 -3.0669
0.6786 2.3777 6900 0.6869 -0.0459 -0.0598 0.5892 0.0139 -69.1580 -63.3046 -3.0607 -3.0664
0.6798 2.4121 7000 0.6868 -0.0463 -0.0602 0.5934 0.0139 -69.2011 -63.3391 -3.0601 -3.0658
0.6762 2.4466 7100 0.6868 -0.0466 -0.0606 0.5936 0.0140 -69.2414 -63.3699 -3.0598 -3.0656
0.6782 2.4810 7200 0.6868 -0.0470 -0.0611 0.5918 0.0141 -69.2927 -63.4167 -3.0595 -3.0652
0.6821 2.5155 7300 0.6868 -0.0472 -0.0612 0.5943 0.0140 -69.3050 -63.4345 -3.0589 -3.0647
0.6806 2.5500 7400 0.6868 -0.0473 -0.0614 0.5908 0.0141 -69.3214 -63.4432 -3.0588 -3.0646
0.6824 2.5844 7500 0.6867 -0.0475 -0.0616 0.5918 0.0142 -69.3426 -63.4585 -3.0589 -3.0647
0.6789 2.6189 7600 0.6868 -0.0477 -0.0618 0.5915 0.0141 -69.3578 -63.4788 -3.0584 -3.0642
0.6768 2.6533 7700 0.6867 -0.0475 -0.0618 0.5946 0.0144 -69.3650 -63.4617 -3.0582 -3.0640
0.6808 2.6878 7800 0.6867 -0.0477 -0.0619 0.5918 0.0142 -69.3712 -63.4863 -3.0584 -3.0642
0.6782 2.7223 7900 0.6867 -0.0478 -0.0621 0.5925 0.0143 -69.3874 -63.4902 -3.0581 -3.0639
0.6794 2.7567 8000 0.6867 -0.0479 -0.0621 0.5897 0.0142 -69.3922 -63.5035 -3.0580 -3.0638
0.674 2.7912 8100 0.6867 -0.0479 -0.0621 0.5911 0.0142 -69.3883 -63.4992 -3.0580 -3.0638
0.6766 2.8256 8200 0.6866 -0.0478 -0.0622 0.5899 0.0144 -69.4003 -63.4938 -3.0581 -3.0639
0.6821 2.8601 8300 0.6867 -0.0479 -0.0622 0.5890 0.0143 -69.3970 -63.4998 -3.0579 -3.0637
0.6795 2.8946 8400 0.6867 -0.0478 -0.0621 0.5904 0.0142 -69.3868 -63.4954 -3.0580 -3.0637
0.679 2.9290 8500 0.6867 -0.0479 -0.0622 0.5925 0.0143 -69.3981 -63.4995 -3.0579 -3.0637
0.6816 2.9635 8600 0.6867 -0.0478 -0.0621 0.5922 0.0144 -69.3946 -63.4907 -3.0579 -3.0637
0.6751 2.9979 8700 0.6867 -0.0478 -0.0620 0.5936 0.0142 -69.3779 -63.4876 -3.0580 -3.0637

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
1.1B params
Tensor type
F32
·

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR3e-8_BS32_3epochs_old