Edit model card

tinyllama-1.1b-sum-dpo-full_LR2e-8_2epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6901
  • Rewards/chosen: -0.0088
  • Rewards/rejected: -0.0152
  • Rewards/accuracies: 0.5892
  • Rewards/margins: 0.0064
  • Logps/rejected: -64.7009
  • Logps/chosen: -59.5896
  • Logits/rejected: -3.1105
  • Logits/chosen: -3.1162

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6932 0.0172 100 0.6932 0.0001 0.0002 0.4947 -0.0000 -63.1650 -58.7014 -3.1574 -3.1631
0.6932 0.0345 200 0.6932 0.0000 0.0002 0.4840 -0.0001 -63.1618 -58.7076 -3.1573 -3.1630
0.6932 0.0517 300 0.6932 -0.0000 0.0001 0.4842 -0.0001 -63.1720 -58.7121 -3.1574 -3.1631
0.6933 0.0689 400 0.6932 -0.0000 0.0000 0.4856 -0.0000 -63.1788 -58.7153 -3.1577 -3.1633
0.693 0.0861 500 0.6932 0.0001 0.0002 0.4847 -0.0001 -63.1648 -58.7040 -3.1576 -3.1633
0.6931 0.1034 600 0.6931 0.0000 0.0000 0.4961 0.0000 -63.1795 -58.7070 -3.1572 -3.1629
0.6932 0.1206 700 0.6932 0.0000 0.0001 0.4912 -0.0001 -63.1701 -58.7077 -3.1574 -3.1631
0.693 0.1378 800 0.6932 0.0002 0.0002 0.4909 -0.0000 -63.1604 -58.6950 -3.1576 -3.1633
0.6934 0.1551 900 0.6932 0.0001 0.0001 0.5060 -0.0000 -63.1695 -58.7025 -3.1576 -3.1633
0.6932 0.1723 1000 0.6931 0.0001 0.0001 0.4949 0.0000 -63.1684 -58.6973 -3.1574 -3.1631
0.6931 0.1895 1100 0.6931 0.0003 0.0002 0.5156 0.0001 -63.1597 -58.6832 -3.1571 -3.1627
0.693 0.2068 1200 0.6931 0.0003 0.0003 0.5153 0.0001 -63.1531 -58.6773 -3.1570 -3.1627
0.693 0.2240 1300 0.6931 0.0004 0.0003 0.5174 0.0001 -63.1508 -58.6695 -3.1567 -3.1623
0.6928 0.2412 1400 0.6931 0.0005 0.0003 0.5130 0.0001 -63.1467 -58.6638 -3.1566 -3.1622
0.6927 0.2584 1500 0.6931 0.0005 0.0004 0.5207 0.0002 -63.1443 -58.6600 -3.1564 -3.1621
0.6928 0.2757 1600 0.6931 0.0006 0.0004 0.5156 0.0002 -63.1352 -58.6497 -3.1559 -3.1616
0.6928 0.2929 1700 0.6930 0.0008 0.0005 0.5290 0.0002 -63.1288 -58.6357 -3.1553 -3.1610
0.6923 0.3101 1800 0.6930 0.0008 0.0005 0.5395 0.0003 -63.1303 -58.6354 -3.1550 -3.1607
0.6924 0.3274 1900 0.6930 0.0008 0.0006 0.5223 0.0003 -63.1249 -58.6291 -3.1546 -3.1602
0.6925 0.3446 2000 0.6929 0.0009 0.0005 0.5423 0.0004 -63.1319 -58.6215 -3.1544 -3.1601
0.6922 0.3618 2100 0.6929 0.0011 0.0006 0.5511 0.0004 -63.1153 -58.6039 -3.1539 -3.1595
0.6917 0.3790 2200 0.6929 0.0012 0.0006 0.5379 0.0005 -63.1153 -58.5967 -3.1533 -3.1590
0.6914 0.3963 2300 0.6928 0.0013 0.0007 0.5481 0.0006 -63.1094 -58.5806 -3.1531 -3.1587
0.6921 0.4135 2400 0.6928 0.0013 0.0007 0.5500 0.0007 -63.1136 -58.5781 -3.1524 -3.1579
0.6922 0.4307 2500 0.6928 0.0015 0.0007 0.5602 0.0008 -63.1131 -58.5648 -3.1518 -3.1575
0.6909 0.4480 2600 0.6927 0.0016 0.0007 0.5581 0.0009 -63.1079 -58.5517 -3.1512 -3.1568
0.6911 0.4652 2700 0.6927 0.0016 0.0007 0.5627 0.0009 -63.1136 -58.5521 -3.1505 -3.1562
0.6917 0.4824 2800 0.6927 0.0017 0.0008 0.5507 0.0010 -63.1044 -58.5383 -3.1503 -3.1559
0.6919 0.4997 2900 0.6926 0.0017 0.0006 0.5609 0.0011 -63.1181 -58.5391 -3.1496 -3.1552
0.6918 0.5169 3000 0.6925 0.0019 0.0006 0.5606 0.0013 -63.1217 -58.5262 -3.1488 -3.1544
0.691 0.5341 3100 0.6925 0.0019 0.0005 0.5669 0.0014 -63.1269 -58.5219 -3.1485 -3.1542
0.692 0.5513 3200 0.6925 0.0019 0.0005 0.5606 0.0014 -63.1309 -58.5268 -3.1477 -3.1533
0.6902 0.5686 3300 0.6924 0.0018 0.0003 0.5604 0.0016 -63.1528 -58.5277 -3.1470 -3.1526
0.6898 0.5858 3400 0.6923 0.0020 0.0003 0.5602 0.0017 -63.1520 -58.5135 -3.1462 -3.1518
0.6902 0.6030 3500 0.6923 0.0019 0.0001 0.5532 0.0018 -63.1674 -58.5221 -3.1455 -3.1511
0.6905 0.6203 3600 0.6923 0.0018 -0.0000 0.5697 0.0018 -63.1817 -58.5294 -3.1446 -3.1502
0.6877 0.6375 3700 0.6922 0.0019 -0.0000 0.5741 0.0020 -63.1849 -58.5181 -3.1438 -3.1494
0.691 0.6547 3800 0.6921 0.0019 -0.0001 0.5676 0.0021 -63.1942 -58.5193 -3.1430 -3.1486
0.6881 0.6720 3900 0.6921 0.0018 -0.0004 0.5639 0.0022 -63.2183 -58.5326 -3.1420 -3.1476
0.6891 0.6892 4000 0.6920 0.0018 -0.0006 0.5727 0.0023 -63.2358 -58.5348 -3.1408 -3.1465
0.688 0.7064 4100 0.6920 0.0018 -0.0007 0.5695 0.0025 -63.2489 -58.5334 -3.1397 -3.1453
0.6893 0.7236 4200 0.6920 0.0015 -0.0009 0.5685 0.0025 -63.2735 -58.5574 -3.1390 -3.1446
0.6897 0.7409 4300 0.6919 0.0015 -0.0012 0.5748 0.0027 -63.2966 -58.5608 -3.1383 -3.1439
0.6904 0.7581 4400 0.6918 0.0012 -0.0016 0.5711 0.0028 -63.3356 -58.5872 -3.1374 -3.1430
0.6905 0.7753 4500 0.6918 0.0013 -0.0016 0.5850 0.0029 -63.3426 -58.5858 -3.1369 -3.1425
0.6883 0.7926 4600 0.6917 0.0011 -0.0019 0.5788 0.0029 -63.3659 -58.6051 -3.1357 -3.1413
0.6897 0.8098 4700 0.6916 0.0010 -0.0021 0.5741 0.0031 -63.3948 -58.6130 -3.1353 -3.1409
0.6905 0.8270 4800 0.6916 0.0008 -0.0024 0.5748 0.0032 -63.4159 -58.6317 -3.1348 -3.1404
0.6875 0.8442 4900 0.6916 0.0005 -0.0028 0.5774 0.0033 -63.4563 -58.6580 -3.1340 -3.1396
0.6899 0.8615 5000 0.6915 0.0005 -0.0029 0.5769 0.0033 -63.4652 -58.6640 -3.1327 -3.1384
0.6864 0.8787 5100 0.6915 0.0003 -0.0031 0.5683 0.0034 -63.4888 -58.6839 -3.1319 -3.1375
0.6865 0.8959 5200 0.6914 0.0001 -0.0035 0.5734 0.0036 -63.5340 -58.7065 -3.1314 -3.1371
0.6877 0.9132 5300 0.6913 -0.0001 -0.0039 0.5737 0.0038 -63.5667 -58.7197 -3.1309 -3.1365
0.6889 0.9304 5400 0.6913 -0.0003 -0.0042 0.5760 0.0039 -63.5960 -58.7374 -3.1301 -3.1358
0.688 0.9476 5500 0.6913 -0.0004 -0.0043 0.5660 0.0039 -63.6131 -58.7516 -3.1294 -3.1351
0.6899 0.9649 5600 0.6913 -0.0006 -0.0045 0.5746 0.0039 -63.6304 -58.7708 -3.1287 -3.1343
0.687 0.9821 5700 0.6911 -0.0006 -0.0048 0.5788 0.0042 -63.6628 -58.7723 -3.1281 -3.1337
0.6857 0.9993 5800 0.6911 -0.0009 -0.0051 0.5713 0.0042 -63.6879 -58.7999 -3.1278 -3.1334
0.6864 1.0165 5900 0.6911 -0.0012 -0.0055 0.5788 0.0044 -63.7349 -58.8299 -3.1271 -3.1327
0.6888 1.0338 6000 0.6910 -0.0014 -0.0059 0.5790 0.0044 -63.7658 -58.8540 -3.1259 -3.1316
0.6857 1.0510 6100 0.6909 -0.0016 -0.0062 0.5795 0.0046 -63.8031 -58.8730 -3.1255 -3.1312
0.6889 1.0682 6200 0.6909 -0.0019 -0.0066 0.5764 0.0047 -63.8376 -58.9032 -3.1248 -3.1305
0.6865 1.0855 6300 0.6908 -0.0022 -0.0070 0.5788 0.0048 -63.8796 -58.9275 -3.1245 -3.1303
0.6884 1.1027 6400 0.6909 -0.0024 -0.0071 0.5748 0.0047 -63.8941 -58.9523 -3.1230 -3.1287
0.6893 1.1199 6500 0.6908 -0.0026 -0.0075 0.5813 0.0049 -63.9268 -58.9676 -3.1230 -3.1287
0.6886 1.1371 6600 0.6908 -0.0030 -0.0079 0.5748 0.0050 -63.9723 -59.0090 -3.1216 -3.1273
0.6865 1.1544 6700 0.6908 -0.0032 -0.0082 0.5804 0.0050 -64.0010 -59.0346 -3.1218 -3.1275
0.6868 1.1716 6800 0.6907 -0.0033 -0.0084 0.5836 0.0051 -64.0239 -59.0461 -3.1204 -3.1261
0.6882 1.1888 6900 0.6907 -0.0037 -0.0089 0.5811 0.0051 -64.0668 -59.0845 -3.1198 -3.1255
0.6859 1.2061 7000 0.6907 -0.0041 -0.0093 0.5797 0.0052 -64.1093 -59.1233 -3.1204 -3.1261
0.685 1.2233 7100 0.6906 -0.0045 -0.0098 0.5797 0.0053 -64.1565 -59.1598 -3.1180 -3.1237
0.6858 1.2405 7200 0.6905 -0.0046 -0.0101 0.5820 0.0055 -64.1910 -59.1702 -3.1184 -3.1241
0.6905 1.2578 7300 0.6905 -0.0049 -0.0104 0.5804 0.0055 -64.2204 -59.2016 -3.1182 -3.1239
0.6852 1.2750 7400 0.6906 -0.0051 -0.0106 0.5790 0.0055 -64.2432 -59.2260 -3.1180 -3.1237
0.6873 1.2922 7500 0.6906 -0.0055 -0.0109 0.5860 0.0055 -64.2745 -59.2600 -3.1174 -3.1231
0.6871 1.3094 7600 0.6905 -0.0055 -0.0112 0.5829 0.0057 -64.3001 -59.2643 -3.1166 -3.1223
0.6865 1.3267 7700 0.6904 -0.0057 -0.0115 0.5846 0.0058 -64.3291 -59.2850 -3.1161 -3.1218
0.6888 1.3439 7800 0.6905 -0.0061 -0.0118 0.5820 0.0057 -64.3590 -59.3192 -3.1161 -3.1218
0.6868 1.3611 7900 0.6904 -0.0062 -0.0121 0.5846 0.0058 -64.3857 -59.3334 -3.1164 -3.1220
0.6876 1.3784 8000 0.6903 -0.0063 -0.0123 0.5839 0.0060 -64.4065 -59.3406 -3.1148 -3.1204
0.688 1.3956 8100 0.6904 -0.0066 -0.0125 0.5832 0.0059 -64.4252 -59.3670 -3.1144 -3.1201
0.6858 1.4128 8200 0.6903 -0.0068 -0.0127 0.5781 0.0059 -64.4505 -59.3885 -3.1140 -3.1197
0.6836 1.4300 8300 0.6904 -0.0069 -0.0129 0.5822 0.0059 -64.4660 -59.4050 -3.1139 -3.1195
0.6863 1.4473 8400 0.6903 -0.0071 -0.0132 0.5829 0.0061 -64.4968 -59.4218 -3.1146 -3.1203
0.6847 1.4645 8500 0.6903 -0.0073 -0.0133 0.5871 0.0060 -64.5110 -59.4395 -3.1132 -3.1189
0.6861 1.4817 8600 0.6903 -0.0075 -0.0136 0.5864 0.0061 -64.5362 -59.4577 -3.1135 -3.1192
0.6847 1.4990 8700 0.6903 -0.0077 -0.0138 0.5843 0.0061 -64.5599 -59.4786 -3.1127 -3.1184
0.6866 1.5162 8800 0.6902 -0.0077 -0.0139 0.5878 0.0062 -64.5684 -59.4835 -3.1125 -3.1182
0.6841 1.5334 8900 0.6902 -0.0079 -0.0141 0.5874 0.0062 -64.5873 -59.4978 -3.1129 -3.1186
0.6799 1.5507 9000 0.6902 -0.0080 -0.0142 0.5857 0.0062 -64.6045 -59.5160 -3.1124 -3.1181
0.6832 1.5679 9100 0.6902 -0.0080 -0.0143 0.5862 0.0062 -64.6061 -59.5157 -3.1117 -3.1174
0.6846 1.5851 9200 0.6903 -0.0083 -0.0144 0.5811 0.0062 -64.6246 -59.5410 -3.1116 -3.1173
0.6853 1.6023 9300 0.6902 -0.0083 -0.0146 0.5827 0.0062 -64.6375 -59.5467 -3.1120 -3.1177
0.6882 1.6196 9400 0.6902 -0.0084 -0.0147 0.5885 0.0063 -64.6528 -59.5515 -3.1111 -3.1169
0.6867 1.6368 9500 0.6902 -0.0084 -0.0147 0.5816 0.0063 -64.6481 -59.5528 -3.1110 -3.1166
0.6845 1.6540 9600 0.6902 -0.0085 -0.0148 0.5862 0.0064 -64.6648 -59.5611 -3.1109 -3.1166
0.6855 1.6713 9700 0.6902 -0.0085 -0.0149 0.5876 0.0063 -64.6676 -59.5646 -3.1111 -3.1167
0.682 1.6885 9800 0.6902 -0.0087 -0.0150 0.5867 0.0063 -64.6765 -59.5814 -3.1108 -3.1164
0.6814 1.7057 9900 0.6902 -0.0087 -0.0150 0.5913 0.0063 -64.6813 -59.5806 -3.1108 -3.1165
0.6837 1.7229 10000 0.6901 -0.0087 -0.0151 0.5927 0.0064 -64.6926 -59.5854 -3.1107 -3.1163
0.6821 1.7402 10100 0.6901 -0.0087 -0.0151 0.5841 0.0064 -64.6931 -59.5801 -3.1105 -3.1162
0.6867 1.7574 10200 0.6902 -0.0089 -0.0152 0.5816 0.0064 -64.7032 -59.5971 -3.1105 -3.1162
0.6867 1.7746 10300 0.6901 -0.0088 -0.0152 0.5871 0.0064 -64.6972 -59.5881 -3.1104 -3.1161
0.6847 1.7919 10400 0.6902 -0.0089 -0.0151 0.5869 0.0062 -64.6896 -59.5992 -3.1102 -3.1159
0.6861 1.8091 10500 0.6901 -0.0088 -0.0152 0.5862 0.0064 -64.7046 -59.5936 -3.1104 -3.1161
0.6877 1.8263 10600 0.6901 -0.0088 -0.0153 0.5920 0.0064 -64.7073 -59.5967 -3.1104 -3.1161
0.6824 1.8436 10700 0.6901 -0.0089 -0.0153 0.5867 0.0064 -64.7092 -59.5998 -3.1103 -3.1160
0.6839 1.8608 10800 0.6901 -0.0089 -0.0153 0.5878 0.0064 -64.7113 -59.5983 -3.1102 -3.1158
0.6831 1.8780 10900 0.6901 -0.0089 -0.0153 0.5846 0.0064 -64.7147 -59.6028 -3.1104 -3.1160
0.6886 1.8952 11000 0.6901 -0.0089 -0.0154 0.5908 0.0064 -64.7155 -59.6032 -3.1103 -3.1160
0.6859 1.9125 11100 0.6901 -0.0088 -0.0152 0.5846 0.0064 -64.7015 -59.5919 -3.1102 -3.1159
0.685 1.9297 11200 0.6902 -0.0088 -0.0152 0.5846 0.0064 -64.6997 -59.5930 -3.1103 -3.1160
0.6869 1.9469 11300 0.6901 -0.0089 -0.0153 0.5876 0.0064 -64.7081 -59.5984 -3.1104 -3.1161
0.6864 1.9642 11400 0.6901 -0.0088 -0.0152 0.5908 0.0064 -64.6952 -59.5885 -3.1105 -3.1161
0.689 1.9814 11500 0.6902 -0.0089 -0.0153 0.5820 0.0064 -64.7117 -59.6064 -3.1105 -3.1161
0.6865 1.9986 11600 0.6901 -0.0088 -0.0152 0.5892 0.0064 -64.7009 -59.5896 -3.1105 -3.1162

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
13
Safetensors
Model size
1.1B params
Tensor type
F32
·

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR2e-8_2epochs_old