Edit model card

tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6411
  • Rewards/chosen: -1.5955
  • Rewards/rejected: -1.9066
  • Rewards/accuracies: 0.6273
  • Rewards/margins: 0.3112
  • Logps/rejected: -253.4108
  • Logps/chosen: -218.5612
  • Logits/rejected: -2.1502
  • Logits/chosen: -2.1697

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6924 0.0689 400 0.6930 0.0011 0.0007 0.5390 0.0003 -62.6755 -58.9094 -2.9687 -2.9723
0.6891 0.1378 800 0.6909 -0.0061 -0.0108 0.5748 0.0047 -63.8305 -59.6239 -2.9588 -2.9622
0.6874 0.2068 1200 0.6876 -0.0302 -0.0427 0.5871 0.0124 -67.0173 -62.0385 -2.9361 -2.9395
0.676 0.2757 1600 0.6820 -0.1057 -0.1316 0.5850 0.0259 -75.9065 -69.5813 -2.8942 -2.8976
0.6751 0.3446 2000 0.6770 -0.1715 -0.2098 0.5890 0.0384 -83.7308 -76.1611 -2.8434 -2.8468
0.6518 0.4135 2400 0.6676 -0.3727 -0.4381 0.6069 0.0654 -106.5637 -96.2904 -2.7893 -2.7926
0.6695 0.4824 2800 0.6631 -0.4734 -0.5560 0.6141 0.0826 -118.3500 -106.3523 -2.7415 -2.7450
0.6467 0.5513 3200 0.6583 -0.6700 -0.7814 0.625 0.1113 -140.8851 -126.0199 -2.6864 -2.6902
0.6264 0.6203 3600 0.6586 -0.6359 -0.7384 0.6106 0.1024 -136.5857 -122.6100 -2.6176 -2.6225
0.6203 0.6892 4000 0.6523 -0.7851 -0.9183 0.6166 0.1332 -154.5775 -137.5248 -2.5583 -2.5642
0.6341 0.7581 4400 0.6487 -0.8786 -1.0259 0.6129 0.1473 -165.3377 -146.8752 -2.4643 -2.4723
0.6184 0.8270 4800 0.6454 -1.0766 -1.2481 0.6129 0.1716 -187.5630 -166.6730 -2.4141 -2.4242
0.609 0.8959 5200 0.6414 -0.9919 -1.1678 0.6164 0.1759 -179.5278 -158.2066 -2.3970 -2.4080
0.5977 0.9649 5600 0.6432 -0.9166 -1.0804 0.6273 0.1638 -170.7888 -150.6710 -2.3933 -2.4042
0.5845 1.0338 6000 0.6438 -1.3686 -1.6032 0.6245 0.2346 -223.0724 -195.8758 -2.2640 -2.2816
0.5789 1.1027 6400 0.6455 -1.3882 -1.6212 0.6164 0.2331 -224.8725 -197.8306 -2.2428 -2.2595
0.5681 1.1716 6800 0.6434 -1.3348 -1.5500 0.6129 0.2153 -217.7540 -192.4917 -2.2435 -2.2593
0.5602 1.2405 7200 0.6448 -1.3673 -1.5959 0.6234 0.2286 -222.3391 -195.7428 -2.2210 -2.2378
0.6357 1.3094 7600 0.6413 -1.3975 -1.6344 0.6125 0.2368 -226.1876 -198.7702 -2.2034 -2.2208
0.5491 1.3784 8000 0.6438 -1.4655 -1.7121 0.6055 0.2466 -233.9599 -205.5657 -2.1906 -2.2085
0.5537 1.4473 8400 0.6445 -1.4375 -1.6793 0.6259 0.2418 -230.6812 -202.7634 -2.1797 -2.1984
0.61 1.5162 8800 0.6405 -1.0941 -1.2946 0.6164 0.2005 -192.2120 -168.4266 -2.2428 -2.2579
0.523 1.5851 9200 0.6431 -1.4596 -1.7029 0.6289 0.2433 -233.0398 -204.9723 -2.1570 -2.1756
0.5412 1.6540 9600 0.6393 -1.4228 -1.6896 0.6315 0.2668 -231.7097 -201.2986 -2.1513 -2.1708
0.5368 1.7229 10000 0.6408 -1.3358 -1.5858 0.6236 0.2500 -221.3330 -192.5947 -2.1730 -2.1915
0.5064 1.7919 10400 0.6423 -1.0625 -1.2620 0.6215 0.1995 -188.9488 -165.2631 -2.2150 -2.2307
0.5268 1.8608 10800 0.6406 -1.4254 -1.6829 0.6341 0.2575 -231.0404 -201.5558 -2.1644 -2.1831
0.5384 1.9297 11200 0.6418 -1.6486 -1.9439 0.6364 0.2954 -257.1440 -223.8720 -2.1299 -2.1503
0.5734 1.9986 11600 0.6378 -1.4356 -1.7101 0.6362 0.2744 -233.7563 -202.5782 -2.1624 -2.1813
0.5302 2.0675 12000 0.6413 -1.7064 -2.0285 0.6292 0.3221 -265.5970 -229.6515 -2.1257 -2.1466
0.4961 2.1365 12400 0.6474 -2.0075 -2.3712 0.6387 0.3637 -299.8690 -259.7696 -2.0958 -2.1178
0.55 2.2054 12800 0.6415 -1.5035 -1.7868 0.6315 0.2833 -241.4328 -209.3660 -2.1574 -2.1761
0.5546 2.2743 13200 0.6425 -1.6715 -1.9874 0.6303 0.3159 -261.4859 -226.1615 -2.1413 -2.1612
0.5639 2.3432 13600 0.6409 -1.5908 -1.8980 0.6289 0.3072 -252.5519 -218.1001 -2.1481 -2.1675
0.5055 2.4121 14000 0.6384 -1.4618 -1.7629 0.6257 0.3010 -239.0347 -205.1979 -2.1665 -2.1857
0.5404 2.4810 14400 0.6405 -1.6514 -1.9790 0.6285 0.3276 -260.6489 -224.1589 -2.1411 -2.1613
0.5348 2.5500 14800 0.6418 -1.6812 -2.0090 0.6276 0.3278 -263.6481 -227.1385 -2.1375 -2.1578
0.5114 2.6189 15200 0.6408 -1.5587 -1.8632 0.6310 0.3046 -249.0734 -214.8810 -2.1538 -2.1732
0.5356 2.6878 15600 0.6405 -1.5493 -1.8534 0.6266 0.3041 -248.0918 -213.9473 -2.1550 -2.1743
0.4885 2.7567 16000 0.6406 -1.5822 -1.8916 0.6269 0.3094 -251.9056 -217.2328 -2.1512 -2.1707
0.5057 2.8256 16400 0.6410 -1.5799 -1.8883 0.6306 0.3084 -251.5751 -217.0051 -2.1527 -2.1720
0.5731 2.8946 16800 0.6412 -1.5917 -1.9021 0.6271 0.3104 -252.9564 -218.1854 -2.1507 -2.1702
0.4958 2.9635 17200 0.6412 -1.5933 -1.9040 0.6296 0.3107 -253.1478 -218.3473 -2.1506 -2.1702

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
1.1B params
Tensor type
F32
·

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs