Edit model card

tinyllama-1.1b-sum-dpo-full

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6549
  • Rewards/chosen: -0.4976
  • Rewards/rejected: -0.6010
  • Rewards/accuracies: 0.6194
  • Rewards/margins: 0.1035
  • Logps/rejected: -123.2810
  • Logps/chosen: -108.4673
  • Logits/rejected: -2.5516
  • Logits/chosen: -2.5584

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6932 0.0172 100 0.6932 0.0000 0.0001 0.4819 -0.0001 -63.1720 -58.7099 -3.1572 -3.1629
0.6931 0.0345 200 0.6932 0.0000 0.0001 0.4893 -0.0001 -63.1716 -58.7118 -3.1576 -3.1632
0.6932 0.0517 300 0.6932 0.0000 0.0001 0.4696 -0.0001 -63.1677 -58.7096 -3.1575 -3.1631
0.6933 0.0689 400 0.6932 0.0002 0.0002 0.4844 -0.0000 -63.1572 -58.6929 -3.1574 -3.1631
0.6931 0.0861 500 0.6931 0.0002 0.0002 0.5016 0.0000 -63.1582 -58.6892 -3.1571 -3.1628
0.6925 0.1034 600 0.6931 0.0004 0.0003 0.5158 0.0002 -63.1507 -58.6671 -3.1566 -3.1623
0.6927 0.1206 700 0.6931 0.0006 0.0004 0.5276 0.0002 -63.1420 -58.6550 -3.1556 -3.1612
0.6924 0.1378 800 0.6929 0.0010 0.0006 0.5509 0.0005 -63.1244 -58.6089 -3.1546 -3.1601
0.692 0.1551 900 0.6928 0.0014 0.0007 0.5534 0.0007 -63.1085 -58.5690 -3.1524 -3.1580
0.6924 0.1723 1000 0.6926 0.0018 0.0007 0.5660 0.0011 -63.1097 -58.5334 -3.1494 -3.1550
0.6913 0.1895 1100 0.6924 0.0021 0.0005 0.5737 0.0016 -63.1303 -58.5028 -3.1458 -3.1514
0.6912 0.2068 1200 0.6921 0.0022 0.0001 0.5795 0.0021 -63.1677 -58.4881 -3.1407 -3.1464
0.6911 0.2240 1300 0.6918 0.0017 -0.0011 0.5901 0.0028 -63.2892 -58.5372 -3.1358 -3.1414
0.6871 0.2412 1400 0.6914 0.0006 -0.0031 0.5785 0.0037 -63.4895 -58.6491 -3.1300 -3.1356
0.6866 0.2584 1500 0.6910 -0.0015 -0.0061 0.5750 0.0045 -63.7853 -58.8661 -3.1246 -3.1303
0.6876 0.2757 1600 0.6907 -0.0038 -0.0091 0.5874 0.0053 -64.0863 -59.0928 -3.1185 -3.1241
0.6882 0.2929 1700 0.6903 -0.0067 -0.0126 0.5850 0.0060 -64.4449 -59.3800 -3.1117 -3.1173
0.6838 0.3101 1800 0.6900 -0.0121 -0.0190 0.5825 0.0069 -65.0772 -59.9201 -3.1038 -3.1095
0.6836 0.3274 1900 0.6895 -0.0157 -0.0235 0.5883 0.0078 -65.5277 -60.2801 -3.0980 -3.1037
0.685 0.3446 2000 0.6889 -0.0227 -0.0319 0.5897 0.0092 -66.3702 -60.9847 -3.0905 -3.0962
0.6828 0.3618 2100 0.6883 -0.0311 -0.0418 0.5806 0.0107 -67.3595 -61.8209 -3.0840 -3.0897
0.6745 0.3790 2200 0.6876 -0.0382 -0.0504 0.5883 0.0123 -68.2227 -62.5273 -3.0753 -3.0811
0.6781 0.3963 2300 0.6872 -0.0405 -0.0537 0.5908 0.0131 -68.5468 -62.7638 -3.0689 -3.0745
0.6809 0.4135 2400 0.6866 -0.0471 -0.0615 0.5906 0.0144 -69.3305 -63.4208 -3.0592 -3.0649
0.6828 0.4307 2500 0.6862 -0.0557 -0.0713 0.5913 0.0156 -70.3087 -64.2813 -3.0501 -3.0558
0.6754 0.4480 2600 0.6856 -0.0615 -0.0783 0.5918 0.0168 -71.0083 -64.8584 -3.0433 -3.0490
0.6768 0.4652 2700 0.6851 -0.0674 -0.0853 0.5957 0.0180 -71.7136 -65.4475 -3.0370 -3.0427
0.6766 0.4824 2800 0.6846 -0.0727 -0.0919 0.5967 0.0192 -72.3669 -65.9771 -3.0308 -3.0365
0.6769 0.4997 2900 0.6843 -0.0755 -0.0954 0.6004 0.0199 -72.7197 -66.2619 -3.0232 -3.0289
0.6781 0.5169 3000 0.6839 -0.0812 -0.1022 0.6027 0.0210 -73.3995 -66.8329 -3.0144 -3.0201
0.67 0.5341 3100 0.6835 -0.0822 -0.1040 0.6004 0.0218 -73.5753 -66.9287 -3.0095 -3.0153
0.6718 0.5513 3200 0.6828 -0.0939 -0.1173 0.6015 0.0235 -74.9148 -68.1005 -2.9982 -3.0040
0.6724 0.5686 3300 0.6822 -0.0999 -0.1249 0.6050 0.0250 -75.6694 -68.7027 -2.9851 -2.9908
0.6625 0.5858 3400 0.6818 -0.1009 -0.1266 0.6090 0.0257 -75.8440 -68.8060 -2.9762 -2.9820
0.6742 0.6030 3500 0.6814 -0.1071 -0.1338 0.6083 0.0267 -76.5617 -69.4202 -2.9687 -2.9745
0.6722 0.6203 3600 0.6810 -0.1126 -0.1404 0.6099 0.0277 -77.2155 -69.9734 -2.9597 -2.9655
0.664 0.6375 3700 0.6803 -0.1209 -0.1502 0.6090 0.0293 -78.2040 -70.8018 -2.9485 -2.9543
0.6644 0.6547 3800 0.6795 -0.1327 -0.1641 0.6111 0.0314 -79.5918 -71.9851 -2.9386 -2.9444
0.6664 0.6720 3900 0.6786 -0.1449 -0.1784 0.6080 0.0335 -81.0222 -73.2044 -2.9300 -2.9358
0.6653 0.6892 4000 0.6781 -0.1559 -0.1909 0.6057 0.0350 -82.2692 -74.3040 -2.9178 -2.9236
0.6532 0.7064 4100 0.6776 -0.1612 -0.1975 0.6125 0.0363 -82.9296 -74.8363 -2.9005 -2.9064
0.6733 0.7236 4200 0.6769 -0.1720 -0.2098 0.6087 0.0378 -84.1639 -75.9119 -2.8890 -2.8949
0.6618 0.7409 4300 0.6764 -0.1798 -0.2189 0.6057 0.0391 -85.0723 -76.6940 -2.8794 -2.8853
0.6625 0.7581 4400 0.6757 -0.1936 -0.2347 0.6053 0.0411 -86.6464 -78.0713 -2.8686 -2.8745
0.6605 0.7753 4500 0.6746 -0.2097 -0.2535 0.6066 0.0439 -88.5342 -79.6776 -2.8590 -2.8649
0.6437 0.7926 4600 0.6737 -0.2242 -0.2703 0.6071 0.0461 -90.2150 -81.1344 -2.8513 -2.8573
0.6526 0.8098 4700 0.6727 -0.2385 -0.2872 0.6069 0.0487 -91.9046 -82.5646 -2.8429 -2.8489
0.6604 0.8270 4800 0.6721 -0.2495 -0.2999 0.6090 0.0504 -93.1696 -83.6594 -2.8351 -2.8410
0.6664 0.8442 4900 0.6712 -0.2621 -0.3148 0.6048 0.0526 -94.6595 -84.9266 -2.8264 -2.8324
0.6499 0.8615 5000 0.6707 -0.2706 -0.3247 0.5955 0.0541 -95.6483 -85.7703 -2.8111 -2.8172
0.6628 0.8787 5100 0.6697 -0.2843 -0.3411 0.5969 0.0568 -97.2923 -87.1431 -2.8035 -2.8094
0.6513 0.8959 5200 0.6693 -0.2867 -0.3444 0.5953 0.0577 -97.6222 -87.3824 -2.7972 -2.8031
0.6475 0.9132 5300 0.6692 -0.2901 -0.3484 0.5987 0.0583 -98.0213 -87.7248 -2.7882 -2.7943
0.6494 0.9304 5400 0.6687 -0.2940 -0.3536 0.6015 0.0596 -98.5368 -88.1090 -2.7827 -2.7887
0.6412 0.9476 5500 0.6682 -0.3024 -0.3635 0.5997 0.0610 -99.5251 -88.9533 -2.7734 -2.7794
0.6531 0.9649 5600 0.6680 -0.2995 -0.3610 0.6046 0.0615 -99.2758 -88.6585 -2.7683 -2.7743
0.652 0.9821 5700 0.6671 -0.3121 -0.3760 0.6041 0.0639 -100.7801 -89.9234 -2.7604 -2.7664
0.6355 0.9993 5800 0.6663 -0.3272 -0.3936 0.6057 0.0664 -102.5409 -91.4366 -2.7489 -2.7549
0.6362 1.0165 5900 0.6654 -0.3504 -0.4199 0.6043 0.0695 -105.1658 -93.7475 -2.7329 -2.7390
0.6587 1.0338 6000 0.6654 -0.3453 -0.4145 0.6076 0.0692 -104.6326 -93.2431 -2.7260 -2.7321
0.6337 1.0510 6100 0.6649 -0.3492 -0.4197 0.6078 0.0705 -105.1470 -93.6331 -2.7177 -2.7237
0.6372 1.0682 6200 0.6640 -0.3675 -0.4408 0.6090 0.0734 -107.2651 -95.4612 -2.7083 -2.7144
0.6555 1.0855 6300 0.6633 -0.3808 -0.4563 0.6111 0.0755 -108.8140 -96.7948 -2.7009 -2.7071
0.6406 1.1027 6400 0.6629 -0.3843 -0.4611 0.6108 0.0768 -109.2905 -97.1394 -2.6941 -2.7003
0.6445 1.1199 6500 0.6626 -0.3894 -0.4670 0.6097 0.0776 -109.8768 -97.6507 -2.6860 -2.6923
0.6438 1.1371 6600 0.6627 -0.3907 -0.4683 0.6073 0.0776 -110.0129 -97.7839 -2.6814 -2.6877
0.6411 1.1544 6700 0.6622 -0.3996 -0.4791 0.6122 0.0795 -111.0866 -98.6695 -2.6729 -2.6791
0.6224 1.1716 6800 0.6614 -0.4163 -0.4982 0.6115 0.0819 -112.9988 -100.3370 -2.6625 -2.6688
0.6437 1.1888 6900 0.6610 -0.4232 -0.5064 0.6106 0.0832 -113.8220 -101.0292 -2.6554 -2.6618
0.6268 1.2061 7000 0.6604 -0.4419 -0.5278 0.6090 0.0859 -115.9616 -102.9045 -2.6490 -2.6553
0.6303 1.2233 7100 0.6604 -0.4379 -0.5238 0.6129 0.0859 -115.5604 -102.5041 -2.6443 -2.6506
0.6251 1.2405 7200 0.6600 -0.4437 -0.5309 0.6101 0.0872 -116.2726 -103.0814 -2.6383 -2.6448
0.6531 1.2578 7300 0.6602 -0.4339 -0.5202 0.6125 0.0863 -115.1998 -102.0999 -2.6366 -2.6430
0.6456 1.2750 7400 0.6600 -0.4313 -0.5180 0.6125 0.0867 -114.9813 -101.8414 -2.6345 -2.6409
0.6455 1.2922 7500 0.6597 -0.4307 -0.5180 0.6148 0.0873 -114.9807 -101.7862 -2.6292 -2.6357
0.6762 1.3094 7600 0.6593 -0.4392 -0.5278 0.6118 0.0887 -115.9649 -102.6288 -2.6216 -2.6281
0.6365 1.3267 7700 0.6592 -0.4402 -0.5295 0.6157 0.0893 -116.1288 -102.7343 -2.6172 -2.6237
0.6211 1.3439 7800 0.6588 -0.4484 -0.5389 0.6194 0.0906 -117.0741 -103.5481 -2.6115 -2.6180
0.641 1.3611 7900 0.6581 -0.4553 -0.5479 0.6217 0.0926 -117.9735 -104.2409 -2.6077 -2.6143
0.6228 1.3784 8000 0.6578 -0.4583 -0.5520 0.6215 0.0937 -118.3795 -104.5455 -2.6043 -2.6109
0.641 1.3956 8100 0.6579 -0.4658 -0.5596 0.6178 0.0939 -119.1444 -105.2910 -2.5997 -2.6063
0.6504 1.4128 8200 0.6571 -0.4707 -0.5666 0.6213 0.0959 -119.8413 -105.7863 -2.5974 -2.6040
0.6472 1.4300 8300 0.6573 -0.4661 -0.5612 0.6217 0.0951 -119.3045 -105.3220 -2.5953 -2.6018
0.6298 1.4473 8400 0.6573 -0.4609 -0.5560 0.6206 0.0950 -118.7768 -104.8056 -2.5928 -2.5994
0.6207 1.4645 8500 0.6573 -0.4579 -0.5531 0.6180 0.0952 -118.4887 -104.5014 -2.5885 -2.5952
0.6661 1.4817 8600 0.6571 -0.4639 -0.5598 0.6204 0.0959 -119.1632 -105.1053 -2.5846 -2.5913
0.6475 1.4990 8700 0.6572 -0.4570 -0.5525 0.6190 0.0954 -118.4251 -104.4133 -2.5846 -2.5912
0.6476 1.5162 8800 0.6569 -0.4604 -0.5566 0.6194 0.0962 -118.8439 -104.7545 -2.5816 -2.5883
0.6336 1.5334 8900 0.6568 -0.4692 -0.5663 0.6190 0.0971 -119.8081 -105.6329 -2.5772 -2.5839
0.6282 1.5507 9000 0.6564 -0.4708 -0.5690 0.6187 0.0981 -120.0761 -105.7962 -2.5754 -2.5821
0.646 1.5679 9100 0.6565 -0.4724 -0.5704 0.6187 0.0980 -120.2213 -105.9529 -2.5732 -2.5799
0.6225 1.5851 9200 0.6563 -0.4762 -0.5749 0.6190 0.0987 -120.6733 -106.3303 -2.5714 -2.5781
0.6223 1.6023 9300 0.6562 -0.4763 -0.5753 0.6180 0.0990 -120.7107 -106.3383 -2.5692 -2.5759
0.6288 1.6196 9400 0.6559 -0.4818 -0.5819 0.6201 0.1001 -121.3710 -106.8921 -2.5664 -2.5731
0.6223 1.6368 9500 0.6557 -0.4823 -0.5828 0.6176 0.1005 -121.4601 -106.9374 -2.5650 -2.5717
0.6363 1.6540 9600 0.6556 -0.4891 -0.5902 0.6197 0.1011 -122.2042 -107.6243 -2.5615 -2.5683
0.6355 1.6713 9700 0.6556 -0.4880 -0.5892 0.6211 0.1012 -122.1034 -107.5130 -2.5609 -2.5677
0.6247 1.6885 9800 0.6555 -0.4894 -0.5910 0.6201 0.1015 -122.2755 -107.6543 -2.5603 -2.5670
0.5826 1.7057 9900 0.6554 -0.4911 -0.5929 0.6206 0.1019 -122.4715 -107.8182 -2.5591 -2.5659
0.6181 1.7229 10000 0.6553 -0.4923 -0.5945 0.6204 0.1022 -122.6296 -107.9373 -2.5579 -2.5647
0.6365 1.7402 10100 0.6553 -0.4917 -0.5938 0.6201 0.1022 -122.5635 -107.8778 -2.5567 -2.5635
0.6269 1.7574 10200 0.6552 -0.4952 -0.5977 0.6208 0.1025 -122.9497 -108.2321 -2.5556 -2.5624
0.6573 1.7746 10300 0.6553 -0.4962 -0.5988 0.6201 0.1026 -123.0645 -108.3347 -2.5542 -2.5610
0.6036 1.7919 10400 0.6552 -0.4953 -0.5980 0.6197 0.1027 -122.9784 -108.2400 -2.5542 -2.5610
0.6178 1.8091 10500 0.6549 -0.4956 -0.5990 0.6213 0.1034 -123.0831 -108.2757 -2.5531 -2.5598
0.6403 1.8263 10600 0.6551 -0.4967 -0.5996 0.6204 0.1030 -123.1450 -108.3809 -2.5527 -2.5594
0.6341 1.8436 10700 0.6550 -0.4965 -0.5997 0.6206 0.1032 -123.1496 -108.3595 -2.5523 -2.5590
0.627 1.8608 10800 0.6549 -0.4971 -0.6006 0.6211 0.1035 -123.2409 -108.4216 -2.5521 -2.5589
0.6335 1.8780 10900 0.6550 -0.4974 -0.6009 0.6201 0.1035 -123.2728 -108.4564 -2.5523 -2.5590
0.6262 1.8952 11000 0.6550 -0.4971 -0.6003 0.6201 0.1033 -123.2126 -108.4185 -2.5520 -2.5588
0.6311 1.9125 11100 0.6548 -0.4971 -0.6009 0.6211 0.1038 -123.2688 -108.4253 -2.5521 -2.5589
0.6239 1.9297 11200 0.6551 -0.4971 -0.6003 0.6201 0.1031 -123.2061 -108.4263 -2.5516 -2.5583
0.6629 1.9469 11300 0.6550 -0.4970 -0.6003 0.6206 0.1033 -123.2066 -108.4107 -2.5518 -2.5587
0.6308 1.9642 11400 0.6550 -0.4972 -0.6005 0.6197 0.1033 -123.2305 -108.4360 -2.5518 -2.5586
0.6532 1.9814 11500 0.6550 -0.4972 -0.6005 0.6197 0.1033 -123.2317 -108.4313 -2.5517 -2.5585
0.6257 1.9986 11600 0.6549 -0.4976 -0.6010 0.6194 0.1035 -123.2810 -108.4673 -2.5516 -2.5584

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
462
Safetensors
Model size
1.1B params
Tensor type
F32
·

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR1e-7_2epochs