Edit model card

tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6501
  • Rewards/chosen: -1.0591
  • Rewards/rejected: -1.2329
  • Rewards/accuracies: 0.6032
  • Rewards/margins: 0.1739
  • Logps/rejected: -186.0431
  • Logps/chosen: -164.9210
  • Logits/rejected: -2.3430
  • Logits/chosen: -2.3551

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.693 0.0689 400 0.6931 0.0003 0.0002 0.5112 0.0001 -62.7270 -58.9858 -2.9691 -2.9727
0.6923 0.1378 800 0.6926 0.0024 0.0012 0.5493 0.0011 -62.6258 -58.7797 -2.9667 -2.9701
0.6901 0.2068 1200 0.6907 -0.0080 -0.0133 0.5697 0.0053 -64.0827 -59.8146 -2.9579 -2.9613
0.6835 0.2757 1600 0.6880 -0.0321 -0.0436 0.5764 0.0114 -67.1050 -62.2266 -2.9410 -2.9442
0.6865 0.3446 2000 0.6852 -0.0690 -0.0874 0.5713 0.0184 -71.4878 -65.9158 -2.9158 -2.9192
0.6767 0.4135 2400 0.6817 -0.1086 -0.1352 0.5816 0.0265 -76.2651 -69.8803 -2.8906 -2.8938
0.6726 0.4824 2800 0.6792 -0.1614 -0.1943 0.5767 0.0328 -82.1753 -75.1597 -2.8617 -2.8651
0.6643 0.5513 3200 0.6729 -0.2581 -0.3074 0.5948 0.0493 -93.4915 -84.8225 -2.8387 -2.8420
0.6614 0.6203 3600 0.6740 -0.2589 -0.3059 0.5904 0.0470 -93.3416 -84.9094 -2.8113 -2.8144
0.6609 0.6892 4000 0.6696 -0.3009 -0.3603 0.6053 0.0594 -98.7785 -89.1073 -2.7879 -2.7912
0.6562 0.7581 4400 0.6667 -0.4072 -0.4790 0.5983 0.0718 -110.6499 -99.7330 -2.7515 -2.7548
0.6569 0.8270 4800 0.6637 -0.4951 -0.5782 0.6059 0.0831 -120.5742 -108.5273 -2.7283 -2.7316
0.6383 0.8959 5200 0.6621 -0.5180 -0.6112 0.6055 0.0932 -123.8654 -110.8119 -2.7112 -2.7149
0.6411 0.9649 5600 0.6623 -0.5228 -0.6134 0.6055 0.0906 -124.0929 -111.2965 -2.6869 -2.6910
0.6293 1.0338 6000 0.6618 -0.6210 -0.7260 0.6064 0.1049 -135.3463 -121.1192 -2.6526 -2.6573
0.6247 1.1027 6400 0.6587 -0.7088 -0.8268 0.5990 0.1180 -145.4310 -129.8984 -2.6201 -2.6254
0.6194 1.1716 6800 0.6580 -0.7955 -0.9191 0.5980 0.1236 -154.6599 -138.5692 -2.5858 -2.5912
0.6127 1.2405 7200 0.6558 -0.6612 -0.7815 0.6039 0.1203 -140.8955 -125.1357 -2.5822 -2.5877
0.6531 1.3094 7600 0.6534 -0.7460 -0.8804 0.6041 0.1344 -150.7862 -133.6133 -2.5502 -2.5564
0.5995 1.3784 8000 0.6528 -0.8128 -0.9555 0.6006 0.1427 -158.2948 -140.2942 -2.5195 -2.5267
0.61 1.4473 8400 0.6540 -0.7310 -0.8603 0.5980 0.1293 -148.7821 -132.1185 -2.5198 -2.5268
0.6575 1.5162 8800 0.6527 -0.8369 -0.9764 0.5997 0.1395 -160.3900 -142.7025 -2.4947 -2.5022
0.5969 1.5851 9200 0.6516 -0.8922 -1.0366 0.6101 0.1444 -166.4089 -148.2315 -2.4661 -2.4746
0.6211 1.6540 9600 0.6526 -0.7875 -0.9248 0.6094 0.1373 -155.2340 -137.7698 -2.4725 -2.4804
0.6011 1.7229 10000 0.6517 -0.8912 -1.0379 0.6099 0.1467 -166.5410 -148.1359 -2.4396 -2.4489
0.571 1.7919 10400 0.6514 -0.8234 -0.9653 0.6122 0.1419 -159.2782 -141.3557 -2.4401 -2.4489
0.5889 1.8608 10800 0.6506 -1.0172 -1.1751 0.6055 0.1579 -180.2568 -160.7332 -2.3932 -2.4039
0.5685 1.9297 11200 0.6486 -1.0256 -1.1907 0.5992 0.1651 -181.8200 -161.5783 -2.3887 -2.3992
0.63 1.9986 11600 0.6502 -0.8869 -1.0380 0.6004 0.1511 -166.5461 -147.7054 -2.4012 -2.4108
0.5891 2.0675 12000 0.6490 -1.0453 -1.2122 0.6046 0.1670 -183.9714 -163.5418 -2.3713 -2.3825
0.5808 2.1365 12400 0.6490 -1.1906 -1.3718 0.6039 0.1811 -199.9255 -178.0778 -2.3382 -2.3508
0.6051 2.2054 12800 0.6496 -1.0959 -1.2648 0.6053 0.1689 -189.2301 -168.6040 -2.3542 -2.3658
0.6223 2.2743 13200 0.6502 -1.0865 -1.2588 0.6069 0.1723 -188.6267 -167.6660 -2.3460 -2.3579
0.6245 2.3432 13600 0.6506 -1.0806 -1.2530 0.5983 0.1724 -188.0497 -167.0715 -2.3462 -2.3583
0.5716 2.4121 14000 0.6511 -1.0306 -1.1979 0.5941 0.1672 -182.5368 -162.0786 -2.3533 -2.3651
0.6078 2.4810 14400 0.6506 -1.0889 -1.2642 0.6004 0.1753 -189.1684 -167.9059 -2.3417 -2.3540
0.6112 2.5500 14800 0.6500 -1.1067 -1.2865 0.5971 0.1798 -191.4036 -169.6898 -2.3390 -2.3514
0.5773 2.6189 15200 0.6508 -1.0435 -1.2146 0.6025 0.1712 -184.2123 -163.3605 -2.3468 -2.3588
0.5983 2.6878 15600 0.6505 -1.0660 -1.2397 0.6018 0.1737 -186.7185 -165.6157 -2.3419 -2.3540
0.5983 2.7567 16000 0.6501 -1.0707 -1.2465 0.6029 0.1758 -187.3989 -166.0839 -2.3408 -2.3530
0.5956 2.8256 16400 0.6500 -1.0594 -1.2333 0.6008 0.1739 -186.0803 -164.9520 -2.3429 -2.3550
0.6221 2.8946 16800 0.6499 -1.0592 -1.2333 0.6041 0.1742 -186.0846 -164.9336 -2.3430 -2.3551
0.6096 2.9635 17200 0.6500 -1.0595 -1.2334 0.6046 0.1739 -186.0905 -164.9614 -2.3429 -2.3549

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs