Edit model card

tinyllama-1.1b-sum-dpo-full_LR1e-7_2epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6549
  • Rewards/chosen: -0.4976
  • Rewards/rejected: -0.6011
  • Rewards/accuracies: 0.6194
  • Rewards/margins: 0.1035
  • Logps/rejected: -123.2918
  • Logps/chosen: -108.4708
  • Logits/rejected: -2.5511
  • Logits/chosen: -2.5579

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6932 0.0172 100 0.6932 -0.0000 0.0000 0.4930 -0.0001 -63.1768 -58.7146 -3.1573 -3.1630
0.6932 0.0345 200 0.6932 -0.0001 -0.0000 0.4772 -0.0001 -63.1802 -58.7210 -3.1574 -3.1630
0.6931 0.0517 300 0.6932 -0.0000 0.0001 0.4840 -0.0001 -63.1670 -58.7127 -3.1573 -3.1630
0.693 0.0689 400 0.6932 -0.0000 0.0001 0.4828 -0.0001 -63.1728 -58.7120 -3.1575 -3.1632
0.6931 0.0861 500 0.6932 0.0002 0.0003 0.4775 -0.0001 -63.1514 -58.6883 -3.1571 -3.1627
0.6924 0.1034 600 0.6931 0.0004 0.0003 0.5021 0.0001 -63.1466 -58.6704 -3.1564 -3.1621
0.6926 0.1206 700 0.6931 0.0006 0.0004 0.5163 0.0002 -63.1388 -58.6536 -3.1556 -3.1613
0.6922 0.1378 800 0.6930 0.0011 0.0007 0.5328 0.0004 -63.1062 -58.6016 -3.1544 -3.1601
0.6919 0.1551 900 0.6928 0.0015 0.0008 0.5467 0.0008 -63.1024 -58.5586 -3.1525 -3.1581
0.6924 0.1723 1000 0.6926 0.0018 0.0007 0.5632 0.0011 -63.1061 -58.5285 -3.1495 -3.1551
0.6913 0.1895 1100 0.6924 0.0021 0.0006 0.5748 0.0015 -63.1198 -58.5001 -3.1456 -3.1512
0.6911 0.2068 1200 0.6921 0.0023 0.0001 0.5829 0.0022 -63.1702 -58.4863 -3.1409 -3.1465
0.6911 0.2240 1300 0.6918 0.0018 -0.0011 0.5783 0.0029 -63.2862 -58.5324 -3.1359 -3.1415
0.6871 0.2412 1400 0.6914 0.0005 -0.0030 0.5718 0.0036 -63.4832 -58.6569 -3.1301 -3.1358
0.6865 0.2584 1500 0.6910 -0.0015 -0.0060 0.5760 0.0045 -63.7806 -58.8602 -3.1249 -3.1305
0.6876 0.2757 1600 0.6906 -0.0038 -0.0091 0.5860 0.0053 -64.0945 -59.0966 -3.1178 -3.1235
0.6883 0.2929 1700 0.6903 -0.0066 -0.0127 0.5846 0.0061 -64.4541 -59.3744 -3.1115 -3.1171
0.684 0.3101 1800 0.6900 -0.0121 -0.0190 0.5843 0.0069 -65.0824 -59.9254 -3.1036 -3.1093
0.6834 0.3274 1900 0.6895 -0.0157 -0.0236 0.5881 0.0078 -65.5351 -60.2850 -3.0983 -3.1039
0.6852 0.3446 2000 0.6890 -0.0228 -0.0319 0.5888 0.0091 -66.3715 -60.9889 -3.0904 -3.0961
0.6827 0.3618 2100 0.6883 -0.0310 -0.0417 0.5885 0.0107 -67.3509 -61.8145 -3.0840 -3.0897
0.6745 0.3790 2200 0.6876 -0.0382 -0.0505 0.5860 0.0123 -68.2293 -62.5301 -3.0753 -3.0810
0.678 0.3963 2300 0.6872 -0.0406 -0.0536 0.5890 0.0131 -68.5438 -62.7670 -3.0691 -3.0748
0.6808 0.4135 2400 0.6867 -0.0471 -0.0614 0.5881 0.0143 -69.3158 -63.4223 -3.0596 -3.0652
0.683 0.4307 2500 0.6861 -0.0556 -0.0712 0.5897 0.0157 -70.3045 -64.2686 -3.0500 -3.0557
0.6754 0.4480 2600 0.6856 -0.0611 -0.0780 0.5885 0.0169 -70.9754 -64.8212 -3.0432 -3.0489
0.6768 0.4652 2700 0.6851 -0.0674 -0.0855 0.5927 0.0181 -71.7327 -65.4567 -3.0371 -3.0427
0.6767 0.4824 2800 0.6846 -0.0729 -0.0920 0.5943 0.0192 -72.3822 -65.9983 -3.0311 -3.0368
0.677 0.4997 2900 0.6843 -0.0755 -0.0955 0.5997 0.0200 -72.7311 -66.2650 -3.0233 -3.0290
0.678 0.5169 3000 0.6838 -0.0814 -0.1025 0.6008 0.0211 -73.4252 -66.8486 -3.0141 -3.0198
0.67 0.5341 3100 0.6836 -0.0822 -0.1038 0.6018 0.0216 -73.5633 -66.9356 -3.0096 -3.0153
0.6718 0.5513 3200 0.6827 -0.0939 -0.1175 0.6034 0.0236 -74.9309 -68.1066 -2.9982 -3.0040
0.6724 0.5686 3300 0.6821 -0.0998 -0.1249 0.6041 0.0251 -75.6721 -68.6965 -2.9850 -2.9907
0.6625 0.5858 3400 0.6819 -0.1010 -0.1266 0.6066 0.0256 -75.8434 -68.8117 -2.9759 -2.9817
0.6743 0.6030 3500 0.6814 -0.1069 -0.1336 0.6113 0.0267 -76.5408 -69.4021 -2.9688 -2.9746
0.6721 0.6203 3600 0.6810 -0.1127 -0.1405 0.6078 0.0278 -77.2252 -69.9806 -2.9599 -2.9657
0.664 0.6375 3700 0.6804 -0.1212 -0.1504 0.6073 0.0292 -78.2202 -70.8276 -2.9486 -2.9544
0.6644 0.6547 3800 0.6795 -0.1329 -0.1643 0.6104 0.0313 -79.6058 -72.0042 -2.9392 -2.9450
0.6665 0.6720 3900 0.6787 -0.1452 -0.1785 0.6059 0.0333 -81.0310 -73.2281 -2.9298 -2.9357
0.6653 0.6892 4000 0.6781 -0.1559 -0.1908 0.6062 0.0349 -82.2593 -74.3019 -2.9178 -2.9236
0.6534 0.7064 4100 0.6777 -0.1615 -0.1973 0.6080 0.0359 -82.9142 -74.8574 -2.9005 -2.9063
0.6736 0.7236 4200 0.6769 -0.1724 -0.2103 0.6069 0.0379 -84.2087 -75.9475 -2.8890 -2.8949
0.6617 0.7409 4300 0.6764 -0.1802 -0.2194 0.6071 0.0392 -85.1160 -76.7326 -2.8792 -2.8851
0.6625 0.7581 4400 0.6756 -0.1938 -0.2351 0.6039 0.0413 -86.6852 -78.0909 -2.8681 -2.8740
0.6604 0.7753 4500 0.6746 -0.2102 -0.2541 0.6076 0.0439 -88.5854 -79.7309 -2.8589 -2.8650
0.6436 0.7926 4600 0.6736 -0.2248 -0.2712 0.6066 0.0463 -90.2984 -81.1957 -2.8510 -2.8569
0.6527 0.8098 4700 0.6728 -0.2396 -0.2882 0.6078 0.0486 -92.0000 -82.6740 -2.8433 -2.8492
0.6604 0.8270 4800 0.6721 -0.2501 -0.3005 0.6066 0.0504 -93.2272 -83.7222 -2.8340 -2.8399
0.6665 0.8442 4900 0.6713 -0.2626 -0.3152 0.6053 0.0526 -94.6995 -84.9707 -2.8265 -2.8324
0.65 0.8615 5000 0.6706 -0.2707 -0.3251 0.5936 0.0543 -95.6856 -85.7848 -2.8110 -2.8169
0.6625 0.8787 5100 0.6697 -0.2838 -0.3407 0.5941 0.0569 -97.2505 -87.0959 -2.8023 -2.8083
0.6511 0.8959 5200 0.6695 -0.2869 -0.3443 0.5983 0.0574 -97.6072 -87.3982 -2.7964 -2.8024
0.6473 0.9132 5300 0.6691 -0.2904 -0.3488 0.5992 0.0584 -98.0594 -87.7473 -2.7880 -2.7940
0.6492 0.9304 5400 0.6687 -0.2941 -0.3536 0.6004 0.0594 -98.5365 -88.1234 -2.7825 -2.7885
0.6409 0.9476 5500 0.6682 -0.3026 -0.3636 0.5978 0.0609 -99.5376 -88.9754 -2.7736 -2.7795
0.6531 0.9649 5600 0.6679 -0.2997 -0.3615 0.6006 0.0617 -99.3275 -88.6850 -2.7683 -2.7743
0.6523 0.9821 5700 0.6671 -0.3127 -0.3766 0.6018 0.0639 -100.8429 -89.9807 -2.7604 -2.7664
0.6355 0.9993 5800 0.6663 -0.3277 -0.3941 0.6078 0.0664 -102.5891 -91.4845 -2.7485 -2.7544
0.6363 1.0165 5900 0.6654 -0.3506 -0.4200 0.6013 0.0695 -105.1840 -93.7690 -2.7327 -2.7388
0.6587 1.0338 6000 0.6654 -0.3455 -0.4149 0.6090 0.0694 -104.6700 -93.2587 -2.7256 -2.7317
0.6335 1.0510 6100 0.6650 -0.3500 -0.4204 0.6085 0.0704 -105.2201 -93.7083 -2.7173 -2.7233
0.637 1.0682 6200 0.6641 -0.3684 -0.4416 0.6083 0.0731 -107.3361 -95.5533 -2.7081 -2.7143
0.6557 1.0855 6300 0.6634 -0.3813 -0.4567 0.6092 0.0754 -108.8497 -96.8372 -2.7011 -2.7073
0.6406 1.1027 6400 0.6629 -0.3842 -0.4611 0.6104 0.0769 -109.2875 -97.1323 -2.6938 -2.7001
0.6445 1.1199 6500 0.6627 -0.3897 -0.4671 0.6104 0.0774 -109.8874 -97.6783 -2.6856 -2.6919
0.6444 1.1371 6600 0.6626 -0.3914 -0.4693 0.6087 0.0779 -110.1084 -97.8481 -2.6817 -2.6880
0.6412 1.1544 6700 0.6621 -0.3997 -0.4794 0.6094 0.0796 -111.1156 -98.6842 -2.6724 -2.6787
0.6223 1.1716 6800 0.6614 -0.4163 -0.4982 0.6145 0.0819 -113.0004 -100.3420 -2.6623 -2.6687
0.6439 1.1888 6900 0.6612 -0.4231 -0.5061 0.6106 0.0830 -113.7915 -101.0268 -2.6555 -2.6619
0.6269 1.2061 7000 0.6606 -0.4424 -0.5279 0.6099 0.0855 -115.9700 -102.9478 -2.6489 -2.6553
0.6301 1.2233 7100 0.6603 -0.4383 -0.5243 0.6122 0.0860 -115.6095 -102.5456 -2.6439 -2.6503
0.625 1.2405 7200 0.6600 -0.4436 -0.5309 0.6129 0.0873 -116.2657 -103.0681 -2.6385 -2.6450
0.653 1.2578 7300 0.6599 -0.4335 -0.5204 0.6134 0.0868 -115.2167 -102.0655 -2.6367 -2.6430
0.6456 1.2750 7400 0.6600 -0.4315 -0.5182 0.6113 0.0866 -114.9959 -101.8630 -2.6344 -2.6409
0.6454 1.2922 7500 0.6597 -0.4307 -0.5182 0.6162 0.0875 -114.9953 -101.7817 -2.6295 -2.6359
0.6769 1.3094 7600 0.6593 -0.4390 -0.5278 0.6162 0.0888 -115.9591 -102.6077 -2.6216 -2.6281
0.6367 1.3267 7700 0.6591 -0.4402 -0.5295 0.6166 0.0893 -116.1309 -102.7307 -2.6170 -2.6235
0.621 1.3439 7800 0.6587 -0.4486 -0.5395 0.6190 0.0909 -117.1267 -103.5701 -2.6111 -2.6176
0.6413 1.3611 7900 0.6581 -0.4553 -0.5479 0.6201 0.0926 -117.9684 -104.2417 -2.6072 -2.6137
0.6228 1.3784 8000 0.6580 -0.4586 -0.5519 0.6217 0.0932 -118.3658 -104.5737 -2.6039 -2.6105
0.6409 1.3956 8100 0.6577 -0.4652 -0.5596 0.6213 0.0944 -119.1380 -105.2326 -2.5999 -2.6065
0.6504 1.4128 8200 0.6572 -0.4709 -0.5666 0.6166 0.0958 -119.8450 -105.8004 -2.5972 -2.6038
0.6468 1.4300 8300 0.6573 -0.4657 -0.5609 0.6231 0.0953 -119.2726 -105.2789 -2.5953 -2.6019
0.6301 1.4473 8400 0.6574 -0.4609 -0.5559 0.6211 0.0950 -118.7683 -104.8034 -2.5927 -2.5993
0.6207 1.4645 8500 0.6575 -0.4578 -0.5526 0.6187 0.0948 -118.4422 -104.4934 -2.5884 -2.5951
0.6661 1.4817 8600 0.6570 -0.4650 -0.5611 0.6206 0.0961 -119.2866 -105.2096 -2.5845 -2.5911
0.6475 1.4990 8700 0.6572 -0.4575 -0.5529 0.6197 0.0954 -118.4655 -104.4587 -2.5841 -2.5908
0.6478 1.5162 8800 0.6569 -0.4607 -0.5569 0.6199 0.0962 -118.8732 -104.7842 -2.5812 -2.5879
0.6338 1.5334 8900 0.6566 -0.4694 -0.5668 0.6201 0.0974 -119.8600 -105.6548 -2.5766 -2.5833
0.6283 1.5507 9000 0.6565 -0.4721 -0.5700 0.6199 0.0979 -120.1781 -105.9173 -2.5752 -2.5819
0.6462 1.5679 9100 0.6564 -0.4728 -0.5710 0.6187 0.0982 -120.2769 -105.9869 -2.5728 -2.5796
0.6228 1.5851 9200 0.6562 -0.4767 -0.5756 0.6194 0.0989 -120.7382 -106.3830 -2.5720 -2.5787
0.6224 1.6023 9300 0.6561 -0.4771 -0.5764 0.6197 0.0993 -120.8189 -106.4213 -2.5689 -2.5756
0.6286 1.6196 9400 0.6558 -0.4825 -0.5830 0.6211 0.1004 -121.4753 -106.9631 -2.5668 -2.5735
0.6221 1.6368 9500 0.6558 -0.4833 -0.5838 0.6199 0.1005 -121.5581 -107.0399 -2.5650 -2.5717
0.6358 1.6540 9600 0.6557 -0.4891 -0.5901 0.6194 0.1010 -122.1902 -107.6185 -2.5614 -2.5681
0.6358 1.6713 9700 0.6556 -0.4886 -0.5899 0.6206 0.1013 -122.1670 -107.5694 -2.5605 -2.5673
0.6243 1.6885 9800 0.6554 -0.4898 -0.5916 0.6211 0.1019 -122.3449 -107.6895 -2.5598 -2.5665
0.5825 1.7057 9900 0.6554 -0.4917 -0.5936 0.6211 0.1019 -122.5433 -107.8852 -2.5589 -2.5656
0.6181 1.7229 10000 0.6552 -0.4927 -0.5951 0.6208 0.1024 -122.6864 -107.9799 -2.5578 -2.5645
0.6364 1.7402 10100 0.6553 -0.4917 -0.5940 0.6201 0.1023 -122.5787 -107.8781 -2.5562 -2.5630
0.6272 1.7574 10200 0.6552 -0.4947 -0.5974 0.6206 0.1027 -122.9187 -108.1824 -2.5552 -2.5620
0.6576 1.7746 10300 0.6551 -0.4968 -0.5997 0.6204 0.1029 -123.1503 -108.3895 -2.5543 -2.5610
0.6036 1.7919 10400 0.6552 -0.4950 -0.5977 0.6187 0.1027 -122.9548 -108.2141 -2.5535 -2.5603
0.6174 1.8091 10500 0.6551 -0.4961 -0.5990 0.6194 0.1029 -123.0769 -108.3228 -2.5536 -2.5603
0.6403 1.8263 10600 0.6551 -0.4962 -0.5992 0.6197 0.1030 -123.0967 -108.3300 -2.5527 -2.5595
0.6341 1.8436 10700 0.6551 -0.4973 -0.6004 0.6185 0.1031 -123.2222 -108.4462 -2.5520 -2.5588
0.627 1.8608 10800 0.6549 -0.4976 -0.6011 0.6211 0.1035 -123.2887 -108.4688 -2.5518 -2.5586
0.6336 1.8780 10900 0.6549 -0.4972 -0.6009 0.6201 0.1037 -123.2694 -108.4345 -2.5519 -2.5587
0.626 1.8952 11000 0.6550 -0.4983 -0.6016 0.6206 0.1034 -123.3421 -108.5379 -2.5516 -2.5584
0.6314 1.9125 11100 0.6551 -0.4974 -0.6004 0.6194 0.1030 -123.2212 -108.4520 -2.5517 -2.5585
0.6239 1.9297 11200 0.6549 -0.4976 -0.6012 0.6192 0.1036 -123.3044 -108.4749 -2.5519 -2.5587
0.6632 1.9469 11300 0.6550 -0.4977 -0.6011 0.6194 0.1033 -123.2879 -108.4866 -2.5514 -2.5582
0.6306 1.9642 11400 0.6550 -0.4978 -0.6010 0.6183 0.1032 -123.2786 -108.4874 -2.5514 -2.5583
0.6532 1.9814 11500 0.6549 -0.4977 -0.6012 0.6206 0.1035 -123.3012 -108.4803 -2.5513 -2.5581
0.6257 1.9986 11600 0.6549 -0.4976 -0.6011 0.6194 0.1035 -123.2918 -108.4708 -2.5511 -2.5579

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
1.1B params
Tensor type
F32
·

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR1e-7_2epochs_old