Edit model card

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_3epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6785
  • Rewards/chosen: -0.1508
  • Rewards/rejected: -0.1845
  • Rewards/accuracies: 0.6085
  • Rewards/margins: 0.0338
  • Logps/rejected: -81.6350
  • Logps/chosen: -73.7914
  • Logits/rejected: -2.9190
  • Logits/chosen: -2.9249

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.0345 100 0.6932 -0.0000 0.0001 0.4828 -0.0001 -63.1721 -58.7140 -3.1575 -3.1632
0.6932 0.0689 200 0.6932 0.0000 0.0001 0.4693 -0.0001 -63.1709 -58.7113 -3.1577 -3.1633
0.693 0.1034 300 0.6932 0.0000 0.0001 0.4761 -0.0001 -63.1730 -58.7112 -3.1574 -3.1630
0.693 0.1378 400 0.6932 0.0001 0.0002 0.4842 -0.0001 -63.1583 -58.6973 -3.1575 -3.1631
0.6931 0.1723 500 0.6931 0.0002 0.0002 0.4933 0.0000 -63.1594 -58.6877 -3.1575 -3.1631
0.6929 0.2068 600 0.6931 0.0004 0.0003 0.4988 0.0001 -63.1463 -58.6680 -3.1569 -3.1625
0.6926 0.2412 700 0.6931 0.0005 0.0004 0.5274 0.0002 -63.1449 -58.6601 -3.1561 -3.1617
0.6926 0.2757 800 0.6930 0.0008 0.0005 0.5286 0.0003 -63.1311 -58.6330 -3.1552 -3.1608
0.692 0.3101 900 0.6929 0.0010 0.0005 0.5437 0.0005 -63.1284 -58.6099 -3.1536 -3.1592
0.6915 0.3446 1000 0.6928 0.0015 0.0007 0.5497 0.0008 -63.1097 -58.5609 -3.1515 -3.1572
0.6914 0.3790 1100 0.6926 0.0018 0.0008 0.5602 0.0011 -63.1051 -58.5277 -3.1497 -3.1554
0.6905 0.4135 1200 0.6924 0.0018 0.0003 0.5702 0.0016 -63.1514 -58.5270 -3.1471 -3.1528
0.6889 0.4480 1300 0.6922 0.0020 -0.0001 0.5720 0.0020 -63.1881 -58.5158 -3.1441 -3.1497
0.6896 0.4824 1400 0.6920 0.0017 -0.0008 0.5685 0.0024 -63.2555 -58.5464 -3.1410 -3.1466
0.6894 0.5169 1500 0.6918 0.0012 -0.0016 0.5723 0.0028 -63.3410 -58.5945 -3.1375 -3.1432
0.6893 0.5513 1600 0.6915 0.0008 -0.0025 0.5741 0.0033 -63.4302 -58.6284 -3.1343 -3.1400
0.6871 0.5858 1700 0.6913 -0.0003 -0.0041 0.5725 0.0038 -63.5920 -58.7397 -3.1296 -3.1353
0.6879 0.6203 1800 0.6910 -0.0016 -0.0061 0.5764 0.0045 -63.7921 -58.8730 -3.1255 -3.1312
0.6869 0.6547 1900 0.6908 -0.0033 -0.0083 0.5804 0.0050 -64.0115 -59.0426 -3.1210 -3.1266
0.6863 0.6892 2000 0.6905 -0.0059 -0.0116 0.5799 0.0057 -64.3388 -59.3014 -3.1155 -3.1212
0.685 0.7236 2100 0.6901 -0.0086 -0.0150 0.5915 0.0064 -64.6834 -59.5751 -3.1097 -3.1154
0.6865 0.7581 2200 0.6899 -0.0116 -0.0186 0.5829 0.0070 -65.0448 -59.8767 -3.1053 -3.1110
0.6841 0.7926 2300 0.6896 -0.0155 -0.0232 0.5867 0.0077 -65.5006 -60.2607 -3.1009 -3.1066
0.6847 0.8270 2400 0.6892 -0.0205 -0.0291 0.5829 0.0085 -66.0859 -60.7633 -3.0966 -3.1023
0.6838 0.8615 2500 0.6888 -0.0258 -0.0352 0.5969 0.0095 -66.7026 -61.2875 -3.0907 -3.0964
0.6839 0.8959 2600 0.6884 -0.0304 -0.0408 0.5925 0.0103 -67.2565 -61.7539 -3.0868 -3.0925
0.6822 0.9304 2700 0.6880 -0.0353 -0.0466 0.5932 0.0113 -67.8404 -62.2428 -3.0819 -3.0877
0.6821 0.9649 2800 0.6877 -0.0370 -0.0490 0.5962 0.0119 -68.0766 -62.4140 -3.0775 -3.0832
0.6805 0.9993 2900 0.6874 -0.0412 -0.0537 0.5897 0.0126 -68.5544 -62.8283 -3.0727 -3.0784
0.6809 1.0338 3000 0.6872 -0.0422 -0.0553 0.5946 0.0132 -68.7141 -62.9285 -3.0668 -3.0725
0.6785 1.0682 3100 0.6869 -0.0451 -0.0589 0.5969 0.0139 -69.0748 -63.2200 -3.0610 -3.0668
0.6763 1.1027 3200 0.6866 -0.0484 -0.0628 0.5925 0.0144 -69.4644 -63.5534 -3.0568 -3.0626
0.681 1.1371 3300 0.6862 -0.0526 -0.0679 0.5922 0.0154 -69.9711 -63.9670 -3.0518 -3.0576
0.6767 1.1716 3400 0.6859 -0.0571 -0.0732 0.5939 0.0161 -70.5048 -64.4254 -3.0464 -3.0522
0.6781 1.2061 3500 0.6856 -0.0613 -0.0780 0.5964 0.0168 -70.9828 -64.8380 -3.0413 -3.0471
0.6774 1.2405 3600 0.6854 -0.0643 -0.0817 0.5983 0.0174 -71.3500 -65.1396 -3.0358 -3.0417
0.676 1.2750 3700 0.6851 -0.0670 -0.0851 0.5990 0.0181 -71.6879 -65.4141 -3.0314 -3.0372
0.675 1.3094 3800 0.6849 -0.0691 -0.0876 0.5969 0.0184 -71.9376 -65.6260 -3.0263 -3.0321
0.6748 1.3439 3900 0.6845 -0.0733 -0.0928 0.6036 0.0195 -72.4597 -66.0422 -3.0216 -3.0274
0.6769 1.3784 4000 0.6842 -0.0778 -0.0979 0.6050 0.0201 -72.9665 -66.4884 -3.0174 -3.0232
0.6739 1.4128 4100 0.6839 -0.0823 -0.1031 0.6057 0.0208 -73.4893 -66.9392 -3.0129 -3.0187
0.6668 1.4473 4200 0.6836 -0.0863 -0.1079 0.6034 0.0216 -73.9684 -67.3375 -3.0092 -3.0150
0.6729 1.4817 4300 0.6834 -0.0878 -0.1098 0.6039 0.0220 -74.1602 -67.4919 -3.0039 -3.0097
0.6748 1.5162 4400 0.6833 -0.0890 -0.1113 0.6046 0.0223 -74.3079 -67.6111 -3.0007 -3.0065
0.6678 1.5507 4500 0.6828 -0.0942 -0.1176 0.6020 0.0234 -74.9388 -68.1347 -2.9958 -3.0016
0.6735 1.5851 4600 0.6827 -0.0978 -0.1215 0.6015 0.0238 -75.3329 -68.4876 -2.9917 -2.9975
0.6742 1.6196 4700 0.6825 -0.0986 -0.1228 0.6050 0.0242 -75.4630 -68.5761 -2.9866 -2.9924
0.6741 1.6540 4800 0.6823 -0.1018 -0.1265 0.6018 0.0247 -75.8309 -68.8950 -2.9819 -2.9877
0.6637 1.6885 4900 0.6819 -0.1054 -0.1308 0.6039 0.0255 -76.2624 -69.2486 -2.9782 -2.9839
0.6702 1.7229 5000 0.6818 -0.1074 -0.1332 0.6046 0.0258 -76.5000 -69.4502 -2.9748 -2.9806
0.6694 1.7574 5100 0.6815 -0.1107 -0.1371 0.6032 0.0264 -76.8899 -69.7811 -2.9703 -2.9761
0.6654 1.7919 5200 0.6813 -0.1132 -0.1401 0.6048 0.0269 -77.1926 -70.0320 -2.9661 -2.9719
0.6698 1.8263 5300 0.6811 -0.1166 -0.1441 0.6066 0.0275 -77.5853 -70.3683 -2.9626 -2.9684
0.6644 1.8608 5400 0.6808 -0.1197 -0.1478 0.6036 0.0281 -77.9603 -70.6842 -2.9592 -2.9650
0.6735 1.8952 5500 0.6807 -0.1219 -0.1503 0.6018 0.0285 -78.2133 -70.8988 -2.9561 -2.9619
0.662 1.9297 5600 0.6805 -0.1258 -0.1548 0.6032 0.0290 -78.6641 -71.2920 -2.9526 -2.9585
0.6634 1.9642 5700 0.6803 -0.1274 -0.1568 0.6050 0.0294 -78.8583 -71.4504 -2.9495 -2.9554
0.6685 1.9986 5800 0.6802 -0.1293 -0.1591 0.6032 0.0298 -79.0912 -71.6448 -2.9473 -2.9532
0.6698 2.0331 5900 0.6800 -0.1323 -0.1626 0.6039 0.0303 -79.4426 -71.9459 -2.9444 -2.9503
0.6627 2.0675 6000 0.6798 -0.1342 -0.1649 0.6064 0.0307 -79.6712 -72.1328 -2.9419 -2.9477
0.6631 2.1020 6100 0.6796 -0.1352 -0.1662 0.6069 0.0310 -79.7986 -72.2308 -2.9397 -2.9456
0.6629 2.1365 6200 0.6796 -0.1373 -0.1685 0.6085 0.0312 -80.0281 -72.4374 -2.9374 -2.9433
0.6672 2.1709 6300 0.6794 -0.1393 -0.1709 0.6076 0.0316 -80.2661 -72.6388 -2.9347 -2.9405
0.6687 2.2054 6400 0.6794 -0.1401 -0.1719 0.6085 0.0317 -80.3653 -72.7241 -2.9322 -2.9380
0.6662 2.2398 6500 0.6793 -0.1415 -0.1735 0.6087 0.0320 -80.5257 -72.8570 -2.9306 -2.9364
0.6701 2.2743 6600 0.6792 -0.1423 -0.1744 0.6097 0.0321 -80.6223 -72.9458 -2.9287 -2.9345
0.6592 2.3088 6700 0.6791 -0.1429 -0.1753 0.6076 0.0323 -80.7084 -73.0069 -2.9274 -2.9333
0.668 2.3432 6800 0.6790 -0.1440 -0.1765 0.6080 0.0325 -80.8346 -73.1154 -2.9267 -2.9326
0.6637 2.3777 6900 0.6790 -0.1452 -0.1778 0.6064 0.0327 -80.9639 -73.2289 -2.9251 -2.9310
0.6645 2.4121 7000 0.6789 -0.1459 -0.1788 0.6090 0.0329 -81.0581 -73.3020 -2.9243 -2.9301
0.6589 2.4466 7100 0.6788 -0.1464 -0.1795 0.6099 0.0331 -81.1271 -73.3526 -2.9234 -2.9293
0.6636 2.4810 7200 0.6787 -0.1477 -0.1809 0.6087 0.0333 -81.2743 -73.4802 -2.9223 -2.9282
0.6679 2.5155 7300 0.6787 -0.1484 -0.1817 0.6101 0.0332 -81.3471 -73.5563 -2.9220 -2.9279
0.6679 2.5500 7400 0.6787 -0.1491 -0.1825 0.6094 0.0334 -81.4263 -73.6218 -2.9215 -2.9273
0.6657 2.5844 7500 0.6786 -0.1496 -0.1831 0.6080 0.0335 -81.4883 -73.6727 -2.9211 -2.9270
0.6638 2.6189 7600 0.6787 -0.1501 -0.1835 0.6078 0.0334 -81.5289 -73.7227 -2.9205 -2.9263
0.6638 2.6533 7700 0.6787 -0.1500 -0.1834 0.6106 0.0334 -81.5211 -73.7089 -2.9202 -2.9261
0.6664 2.6878 7800 0.6786 -0.1503 -0.1839 0.6090 0.0336 -81.5662 -73.7409 -2.9198 -2.9256
0.6631 2.7223 7900 0.6785 -0.1503 -0.1840 0.6080 0.0337 -81.5786 -73.7370 -2.9195 -2.9254
0.666 2.7567 8000 0.6786 -0.1506 -0.1843 0.6069 0.0337 -81.6062 -73.7714 -2.9191 -2.9250
0.6577 2.7912 8100 0.6786 -0.1507 -0.1843 0.6076 0.0336 -81.6118 -73.7826 -2.9193 -2.9252
0.6608 2.8256 8200 0.6786 -0.1507 -0.1844 0.6073 0.0337 -81.6240 -73.7849 -2.9191 -2.9250
0.6736 2.8601 8300 0.6785 -0.1505 -0.1844 0.6080 0.0338 -81.6154 -73.7657 -2.9191 -2.9250
0.6687 2.8946 8400 0.6785 -0.1507 -0.1844 0.6094 0.0337 -81.6251 -73.7842 -2.9192 -2.9251
0.6637 2.9290 8500 0.6785 -0.1505 -0.1843 0.6090 0.0338 -81.6091 -73.7641 -2.9192 -2.9251
0.6689 2.9635 8600 0.6786 -0.1508 -0.1844 0.6078 0.0336 -81.6197 -73.7927 -2.9189 -2.9248
0.6585 2.9979 8700 0.6785 -0.1508 -0.1845 0.6085 0.0338 -81.6350 -73.7914 -2.9190 -2.9249

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
1.1B params
Tensor type
F32
·

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_3epochs_old