Edit model card

tinyllama-1.1b-sum-dpo-full_LR5e-8_2epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6808
  • Rewards/chosen: -0.1214
  • Rewards/rejected: -0.1497
  • Rewards/accuracies: 0.6090
  • Rewards/margins: 0.0284
  • Logps/rejected: -78.1532
  • Logps/chosen: -70.8499
  • Logits/rejected: -2.9566
  • Logits/chosen: -2.9624

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.0172 100 0.6932 0.0001 0.0001 0.4830 -0.0000 -63.1707 -58.7060 -3.1577 -3.1634
0.6931 0.0345 200 0.6932 0.0000 0.0001 0.4763 -0.0001 -63.1661 -58.7098 -3.1576 -3.1633
0.6931 0.0517 300 0.6932 -0.0000 0.0000 0.4893 -0.0001 -63.1759 -58.7129 -3.1578 -3.1635
0.6932 0.0689 400 0.6932 0.0001 0.0003 0.4631 -0.0001 -63.1539 -58.6981 -3.1577 -3.1634
0.6931 0.0861 500 0.6932 0.0001 0.0002 0.4842 -0.0001 -63.1628 -58.7064 -3.1577 -3.1633
0.6929 0.1034 600 0.6932 0.0001 0.0002 0.4870 -0.0000 -63.1628 -58.6974 -3.1574 -3.1630
0.693 0.1206 700 0.6932 0.0002 0.0002 0.4865 -0.0000 -63.1602 -58.6945 -3.1573 -3.1629
0.6928 0.1378 800 0.6931 0.0003 0.0003 0.5005 0.0000 -63.1503 -58.6786 -3.1570 -3.1626
0.6929 0.1551 900 0.6931 0.0006 0.0004 0.5114 0.0002 -63.1377 -58.6515 -3.1564 -3.1620
0.6929 0.1723 1000 0.6930 0.0007 0.0004 0.5163 0.0002 -63.1368 -58.6461 -3.1554 -3.1611
0.6927 0.1895 1100 0.6930 0.0008 0.0005 0.5353 0.0003 -63.1281 -58.6300 -3.1546 -3.1602
0.6926 0.2068 1200 0.6929 0.0011 0.0007 0.5332 0.0004 -63.1063 -58.5972 -3.1533 -3.1590
0.6925 0.2240 1300 0.6928 0.0014 0.0008 0.5551 0.0006 -63.0993 -58.5706 -3.1521 -3.1577
0.6911 0.2412 1400 0.6927 0.0016 0.0006 0.5537 0.0010 -63.1157 -58.5519 -3.1503 -3.1559
0.6906 0.2584 1500 0.6925 0.0018 0.0006 0.5644 0.0013 -63.1246 -58.5291 -3.1489 -3.1545
0.6915 0.2757 1600 0.6924 0.0019 0.0005 0.5660 0.0015 -63.1345 -58.5184 -3.1472 -3.1529
0.6912 0.2929 1700 0.6922 0.0021 0.0002 0.5634 0.0019 -63.1578 -58.5044 -3.1446 -3.1502
0.6889 0.3101 1800 0.6922 0.0019 -0.0001 0.5653 0.0020 -63.1906 -58.5175 -3.1424 -3.1481
0.69 0.3274 1900 0.6919 0.0019 -0.0006 0.5771 0.0025 -63.2406 -58.5210 -3.1407 -3.1464
0.6899 0.3446 2000 0.6919 0.0016 -0.0011 0.5771 0.0027 -63.2913 -58.5564 -3.1376 -3.1433
0.6892 0.3618 2100 0.6917 0.0012 -0.0017 0.5741 0.0030 -63.3523 -58.5873 -3.1355 -3.1412
0.6866 0.3790 2200 0.6916 0.0008 -0.0025 0.5743 0.0033 -63.4306 -58.6304 -3.1324 -3.1381
0.6859 0.3963 2300 0.6914 0.0003 -0.0035 0.5683 0.0037 -63.5263 -58.6859 -3.1305 -3.1361
0.6889 0.4135 2400 0.6912 -0.0006 -0.0047 0.5781 0.0041 -63.6550 -58.7736 -3.1267 -3.1324
0.6902 0.4307 2500 0.6910 -0.0014 -0.0060 0.5781 0.0045 -63.7757 -58.8557 -3.1236 -3.1293
0.685 0.4480 2600 0.6908 -0.0029 -0.0078 0.5825 0.0049 -63.9588 -58.9977 -3.1216 -3.1272
0.6852 0.4652 2700 0.6906 -0.0048 -0.0102 0.5834 0.0054 -64.2020 -59.1921 -3.1189 -3.1246
0.6857 0.4824 2800 0.6904 -0.0062 -0.0120 0.5860 0.0058 -64.3761 -59.3318 -3.1154 -3.1211
0.688 0.4997 2900 0.6902 -0.0087 -0.0149 0.5862 0.0062 -64.6728 -59.5807 -3.1119 -3.1176
0.6877 0.5169 3000 0.6901 -0.0114 -0.0180 0.5795 0.0066 -64.9774 -59.8506 -3.1089 -3.1146
0.6846 0.5341 3100 0.6899 -0.0123 -0.0192 0.5822 0.0070 -65.1015 -59.9371 -3.1072 -3.1128
0.6856 0.5513 3200 0.6897 -0.0154 -0.0230 0.5822 0.0075 -65.4752 -60.2526 -3.1035 -3.1092
0.6825 0.5686 3300 0.6894 -0.0185 -0.0266 0.5860 0.0081 -65.8370 -60.5571 -3.0987 -3.1044
0.6782 0.5858 3400 0.6891 -0.0209 -0.0296 0.5892 0.0087 -66.1367 -60.7975 -3.0949 -3.1006
0.6844 0.6030 3500 0.6890 -0.0230 -0.0321 0.5904 0.0091 -66.3928 -61.0109 -3.0922 -3.0980
0.6825 0.6203 3600 0.6887 -0.0251 -0.0347 0.5934 0.0097 -66.6546 -61.2199 -3.0886 -3.0944
0.6782 0.6375 3700 0.6885 -0.0273 -0.0374 0.5920 0.0101 -66.9203 -61.4445 -3.0848 -3.0906
0.6814 0.6547 3800 0.6882 -0.0304 -0.0412 0.5915 0.0107 -67.2956 -61.7525 -3.0816 -3.0874
0.6784 0.6720 3900 0.6880 -0.0335 -0.0449 0.5936 0.0114 -67.6722 -62.0628 -3.0784 -3.0841
0.6811 0.6892 4000 0.6877 -0.0370 -0.0491 0.5950 0.0121 -68.0929 -62.4165 -3.0748 -3.0805
0.6741 0.7064 4100 0.6875 -0.0379 -0.0503 0.5922 0.0124 -68.2125 -62.4995 -3.0698 -3.0755
0.6837 0.7236 4200 0.6874 -0.0399 -0.0526 0.5953 0.0127 -68.4362 -62.6979 -3.0663 -3.0720
0.6825 0.7409 4300 0.6871 -0.0407 -0.0540 0.5960 0.0133 -68.5772 -62.7839 -3.0631 -3.0689
0.681 0.7581 4400 0.6871 -0.0428 -0.0562 0.5939 0.0134 -68.7993 -62.9920 -3.0603 -3.0660
0.6826 0.7753 4500 0.6868 -0.0463 -0.0604 0.5932 0.0141 -69.2207 -63.3446 -3.0565 -3.0623
0.6744 0.7926 4600 0.6865 -0.0489 -0.0635 0.5943 0.0146 -69.5328 -63.5999 -3.0541 -3.0598
0.6826 0.8098 4700 0.6863 -0.0524 -0.0677 0.5990 0.0153 -69.9523 -63.9563 -3.0511 -3.0569
0.6821 0.8270 4800 0.6861 -0.0559 -0.0716 0.5934 0.0157 -70.3441 -64.3050 -3.0487 -3.0544
0.677 0.8442 4900 0.6858 -0.0593 -0.0757 0.5922 0.0164 -70.7547 -64.6435 -3.0456 -3.0514
0.6765 0.8615 5000 0.6857 -0.0607 -0.0774 0.5934 0.0167 -70.9189 -64.7823 -3.0424 -3.0482
0.6792 0.8787 5100 0.6854 -0.0643 -0.0817 0.5908 0.0174 -71.3476 -65.1395 -3.0393 -3.0451
0.6752 0.8959 5200 0.6852 -0.0667 -0.0845 0.5957 0.0177 -71.6288 -65.3858 -3.0369 -3.0428
0.6752 0.9132 5300 0.6851 -0.0695 -0.0876 0.5911 0.0181 -71.9352 -65.6583 -3.0333 -3.0390
0.6766 0.9304 5400 0.6848 -0.0707 -0.0893 0.5974 0.0186 -72.1090 -65.7783 -3.0313 -3.0370
0.6761 0.9476 5500 0.6848 -0.0718 -0.0904 0.5969 0.0187 -72.2232 -65.8871 -3.0286 -3.0344
0.68 0.9649 5600 0.6847 -0.0716 -0.0904 0.5992 0.0189 -72.2249 -65.8690 -3.0267 -3.0324
0.6744 0.9821 5700 0.6846 -0.0735 -0.0928 0.5983 0.0193 -72.4612 -66.0631 -3.0237 -3.0295
0.6709 0.9993 5800 0.6843 -0.0764 -0.0963 0.5999 0.0199 -72.8088 -66.3480 -3.0203 -3.0261
0.6738 1.0165 5900 0.6842 -0.0770 -0.0972 0.6018 0.0202 -72.8978 -66.4100 -3.0168 -3.0226
0.6755 1.0338 6000 0.6841 -0.0774 -0.0977 0.6050 0.0202 -72.9485 -66.4556 -3.0150 -3.0207
0.6727 1.0510 6100 0.6840 -0.0790 -0.0997 0.6043 0.0207 -73.1473 -66.6101 -3.0124 -3.0182
0.677 1.0682 6200 0.6838 -0.0804 -0.1014 0.6053 0.0210 -73.3202 -66.7547 -3.0100 -3.0157
0.6778 1.0855 6300 0.6838 -0.0826 -0.1037 0.6018 0.0211 -73.5472 -66.9698 -3.0081 -3.0139
0.6772 1.1027 6400 0.6835 -0.0842 -0.1060 0.6043 0.0218 -73.7832 -67.1349 -3.0059 -3.0117
0.6789 1.1199 6500 0.6834 -0.0856 -0.1077 0.6055 0.0221 -73.9500 -67.2763 -3.0033 -3.0090
0.6776 1.1371 6600 0.6833 -0.0879 -0.1102 0.6036 0.0223 -74.2005 -67.5068 -3.0010 -3.0068
0.6755 1.1544 6700 0.6831 -0.0900 -0.1127 0.6057 0.0227 -74.4476 -67.7115 -2.9988 -3.0045
0.6688 1.1716 6800 0.6829 -0.0926 -0.1159 0.6090 0.0233 -74.7660 -67.9706 -2.9960 -3.0017
0.6807 1.1888 6900 0.6828 -0.0942 -0.1176 0.6062 0.0234 -74.9441 -68.1345 -2.9941 -2.9999
0.6691 1.2061 7000 0.6827 -0.0965 -0.1202 0.6071 0.0238 -75.2016 -68.3571 -2.9919 -2.9977
0.6704 1.2233 7100 0.6827 -0.0970 -0.1208 0.6029 0.0238 -75.2590 -68.4095 -2.9898 -2.9956
0.6693 1.2405 7200 0.6825 -0.0985 -0.1226 0.6073 0.0242 -75.4421 -68.5575 -2.9875 -2.9932
0.6811 1.2578 7300 0.6825 -0.0996 -0.1238 0.6046 0.0243 -75.5637 -68.6693 -2.9856 -2.9914
0.6731 1.2750 7400 0.6823 -0.1008 -0.1253 0.6059 0.0245 -75.7101 -68.7873 -2.9843 -2.9901
0.6746 1.2922 7500 0.6823 -0.1009 -0.1257 0.6036 0.0247 -75.7457 -68.8045 -2.9825 -2.9883
0.6788 1.3094 7600 0.6823 -0.1020 -0.1267 0.6073 0.0247 -75.8491 -68.9100 -2.9802 -2.9860
0.6704 1.3267 7700 0.6820 -0.1033 -0.1286 0.6066 0.0253 -76.0417 -69.0466 -2.9779 -2.9837
0.6694 1.3439 7800 0.6820 -0.1054 -0.1309 0.6022 0.0255 -76.2745 -69.2565 -2.9769 -2.9827
0.6779 1.3611 7900 0.6819 -0.1067 -0.1323 0.6069 0.0256 -76.4101 -69.3778 -2.9754 -2.9812
0.6712 1.3784 8000 0.6817 -0.1082 -0.1342 0.6062 0.0260 -76.5969 -69.5304 -2.9740 -2.9798
0.6768 1.3956 8100 0.6817 -0.1096 -0.1359 0.6006 0.0262 -76.7652 -69.6763 -2.9726 -2.9784
0.6714 1.4128 8200 0.6815 -0.1112 -0.1378 0.6046 0.0266 -76.9560 -69.8316 -2.9714 -2.9772
0.6705 1.4300 8300 0.6815 -0.1122 -0.1387 0.6001 0.0265 -77.0526 -69.9333 -2.9699 -2.9758
0.6706 1.4473 8400 0.6814 -0.1131 -0.1399 0.6025 0.0268 -77.1713 -70.0219 -2.9690 -2.9748
0.6651 1.4645 8500 0.6814 -0.1138 -0.1407 0.6064 0.0269 -77.2468 -70.0874 -2.9675 -2.9733
0.676 1.4817 8600 0.6813 -0.1143 -0.1413 0.6032 0.0270 -77.3085 -70.1414 -2.9664 -2.9722
0.6682 1.4990 8700 0.6814 -0.1141 -0.1411 0.6050 0.0269 -77.2885 -70.1259 -2.9660 -2.9718
0.6732 1.5162 8800 0.6813 -0.1147 -0.1417 0.5997 0.0270 -77.3463 -70.1773 -2.9650 -2.9708
0.6706 1.5334 8900 0.6811 -0.1160 -0.1434 0.6108 0.0274 -77.5247 -70.3133 -2.9641 -2.9700
0.6589 1.5507 9000 0.6812 -0.1169 -0.1443 0.6053 0.0274 -77.6094 -70.3996 -2.9631 -2.9689
0.6694 1.5679 9100 0.6811 -0.1172 -0.1447 0.6043 0.0275 -77.6490 -70.4324 -2.9621 -2.9680
0.6691 1.5851 9200 0.6810 -0.1179 -0.1456 0.6011 0.0277 -77.7365 -70.4981 -2.9617 -2.9675
0.6701 1.6023 9300 0.6811 -0.1179 -0.1455 0.6027 0.0276 -77.7288 -70.5024 -2.9611 -2.9669
0.6705 1.6196 9400 0.6810 -0.1182 -0.1461 0.6078 0.0279 -77.7879 -70.5325 -2.9603 -2.9661
0.6699 1.6368 9500 0.6810 -0.1186 -0.1464 0.6073 0.0278 -77.8179 -70.5707 -2.9596 -2.9654
0.6699 1.6540 9600 0.6809 -0.1191 -0.1471 0.6092 0.0279 -77.8869 -70.6254 -2.9591 -2.9649
0.6675 1.6713 9700 0.6809 -0.1196 -0.1477 0.6015 0.0281 -77.9472 -70.6696 -2.9584 -2.9643
0.6639 1.6885 9800 0.6809 -0.1198 -0.1479 0.6083 0.0281 -77.9676 -70.6902 -2.9585 -2.9643
0.6578 1.7057 9900 0.6808 -0.1200 -0.1482 0.6043 0.0282 -77.9982 -70.7108 -2.9583 -2.9641
0.6647 1.7229 10000 0.6809 -0.1204 -0.1485 0.6048 0.0281 -78.0275 -70.7473 -2.9578 -2.9637
0.6655 1.7402 10100 0.6808 -0.1204 -0.1486 0.6071 0.0282 -78.0394 -70.7507 -2.9579 -2.9637
0.6671 1.7574 10200 0.6808 -0.1206 -0.1488 0.6059 0.0282 -78.0608 -70.7737 -2.9574 -2.9632
0.6774 1.7746 10300 0.6808 -0.1207 -0.1490 0.6055 0.0283 -78.0839 -70.7829 -2.9569 -2.9628
0.6629 1.7919 10400 0.6807 -0.1208 -0.1493 0.6076 0.0285 -78.1098 -70.7925 -2.9568 -2.9626
0.6648 1.8091 10500 0.6808 -0.1211 -0.1494 0.6092 0.0283 -78.1209 -70.8208 -2.9567 -2.9625
0.6745 1.8263 10600 0.6808 -0.1212 -0.1495 0.6083 0.0284 -78.1333 -70.8279 -2.9568 -2.9627
0.6665 1.8436 10700 0.6808 -0.1211 -0.1495 0.6053 0.0283 -78.1275 -70.8257 -2.9566 -2.9624
0.6663 1.8608 10800 0.6808 -0.1212 -0.1496 0.6078 0.0284 -78.1382 -70.8324 -2.9566 -2.9624
0.6674 1.8780 10900 0.6807 -0.1213 -0.1497 0.6083 0.0284 -78.1542 -70.8423 -2.9568 -2.9626
0.6767 1.8952 11000 0.6808 -0.1212 -0.1495 0.6078 0.0283 -78.1295 -70.8295 -2.9567 -2.9626
0.6683 1.9125 11100 0.6808 -0.1212 -0.1496 0.6087 0.0284 -78.1378 -70.8316 -2.9569 -2.9628
0.6673 1.9297 11200 0.6807 -0.1212 -0.1496 0.6090 0.0284 -78.1370 -70.8290 -2.9566 -2.9624
0.6781 1.9469 11300 0.6807 -0.1211 -0.1496 0.6097 0.0285 -78.1363 -70.8190 -2.9568 -2.9626
0.6682 1.9642 11400 0.6807 -0.1213 -0.1498 0.6085 0.0285 -78.1613 -70.8446 -2.9567 -2.9626
0.6775 1.9814 11500 0.6808 -0.1212 -0.1495 0.6083 0.0282 -78.1266 -70.8364 -2.9566 -2.9624
0.6688 1.9986 11600 0.6808 -0.1214 -0.1497 0.6090 0.0284 -78.1532 -70.8499 -2.9566 -2.9624

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
10
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_2epochs_old