Edit model card

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_2epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6856
  • Rewards/chosen: -0.0618
  • Rewards/rejected: -0.0788
  • Rewards/accuracies: 0.5955
  • Rewards/margins: 0.0169
  • Logps/rejected: -71.0584
  • Logps/chosen: -64.8961
  • Logits/rejected: -3.0381
  • Logits/chosen: -3.0439

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6932 0.0345 100 0.6932 0.0000 0.0001 0.4805 -0.0001 -63.1716 -58.7091 -3.1575 -3.1632
0.6931 0.0689 200 0.6932 -0.0000 0.0000 0.4863 -0.0000 -63.1768 -58.7119 -3.1575 -3.1632
0.6931 0.1034 300 0.6932 0.0001 0.0002 0.4756 -0.0001 -63.1627 -58.7008 -3.1575 -3.1632
0.693 0.1378 400 0.6931 0.0002 0.0002 0.5007 0.0000 -63.1637 -58.6940 -3.1572 -3.1629
0.6931 0.1723 500 0.6931 0.0003 0.0002 0.4942 0.0001 -63.1590 -58.6825 -3.1569 -3.1625
0.6928 0.2068 600 0.6931 0.0006 0.0005 0.5023 0.0002 -63.1320 -58.6476 -3.1556 -3.1613
0.692 0.2412 700 0.6930 0.0010 0.0006 0.5414 0.0004 -63.1153 -58.6091 -3.1543 -3.1599
0.6923 0.2757 800 0.6928 0.0013 0.0006 0.5588 0.0007 -63.1219 -58.5861 -3.1529 -3.1585
0.6912 0.3101 900 0.6927 0.0017 0.0007 0.5660 0.0010 -63.1103 -58.5464 -3.1501 -3.1558
0.6909 0.3446 1000 0.6925 0.0018 0.0005 0.5646 0.0013 -63.1285 -58.5271 -3.1481 -3.1538
0.6907 0.3790 1100 0.6924 0.0020 0.0003 0.5604 0.0016 -63.1469 -58.5154 -3.1457 -3.1513
0.6898 0.4135 1200 0.6921 0.0018 -0.0003 0.5743 0.0022 -63.2143 -58.5306 -3.1424 -3.1480
0.688 0.4480 1300 0.6919 0.0018 -0.0008 0.5741 0.0026 -63.2606 -58.5351 -3.1392 -3.1448
0.6888 0.4824 1400 0.6917 0.0011 -0.0019 0.5723 0.0030 -63.3749 -58.6054 -3.1364 -3.1420
0.6886 0.5169 1500 0.6915 0.0002 -0.0033 0.5737 0.0035 -63.5057 -58.6878 -3.1325 -3.1382
0.6885 0.5513 1600 0.6912 -0.0003 -0.0043 0.5769 0.0040 -63.6057 -58.7407 -3.1295 -3.1351
0.6861 0.5858 1700 0.6910 -0.0016 -0.0062 0.5746 0.0046 -63.8004 -58.8729 -3.1253 -3.1310
0.6872 0.6203 1800 0.6908 -0.0035 -0.0085 0.5839 0.0050 -64.0325 -59.0604 -3.1214 -3.1270
0.6862 0.6547 1900 0.6905 -0.0054 -0.0110 0.5802 0.0057 -64.2826 -59.2489 -3.1157 -3.1214
0.6859 0.6892 2000 0.6903 -0.0080 -0.0142 0.5869 0.0062 -64.5982 -59.5137 -3.1119 -3.1176
0.6846 0.7236 2100 0.6899 -0.0107 -0.0176 0.5829 0.0069 -64.9428 -59.7842 -3.1059 -3.1116
0.6861 0.7581 2200 0.6897 -0.0133 -0.0207 0.5869 0.0074 -65.2491 -60.0455 -3.1025 -3.1081
0.6836 0.7926 2300 0.6895 -0.0168 -0.0247 0.5922 0.0079 -65.6530 -60.3904 -3.0987 -3.1044
0.6847 0.8270 2400 0.6892 -0.0209 -0.0296 0.5869 0.0087 -66.1402 -60.8069 -3.0949 -3.1007
0.6838 0.8615 2500 0.6889 -0.0250 -0.0343 0.5904 0.0093 -66.6113 -61.2157 -3.0910 -3.0968
0.6841 0.8959 2600 0.6886 -0.0284 -0.0384 0.5955 0.0100 -67.0226 -61.5496 -3.0877 -3.0933
0.6824 0.9304 2700 0.6883 -0.0321 -0.0428 0.5855 0.0107 -67.4593 -61.9186 -3.0839 -3.0897
0.6824 0.9649 2800 0.6880 -0.0334 -0.0447 0.5929 0.0113 -67.6515 -62.0566 -3.0811 -3.0868
0.6812 0.9993 2900 0.6878 -0.0363 -0.0481 0.5906 0.0118 -67.9890 -62.3425 -3.0775 -3.0832
0.6819 1.0338 3000 0.6877 -0.0373 -0.0494 0.5932 0.0120 -68.1166 -62.4440 -3.0740 -3.0797
0.6796 1.0682 3100 0.6874 -0.0392 -0.0518 0.5987 0.0126 -68.3560 -62.6296 -3.0701 -3.0759
0.6776 1.1027 3200 0.6872 -0.0409 -0.0540 0.5906 0.0131 -68.5819 -62.8043 -3.0674 -3.0732
0.6824 1.1371 3300 0.6870 -0.0436 -0.0571 0.5946 0.0135 -68.8899 -63.0750 -3.0643 -3.0701
0.6787 1.1716 3400 0.6869 -0.0458 -0.0596 0.5941 0.0138 -69.1415 -63.2913 -3.0611 -3.0668
0.6801 1.2061 3500 0.6867 -0.0482 -0.0624 0.5929 0.0142 -69.4185 -63.5317 -3.0588 -3.0646
0.6797 1.2405 3600 0.6866 -0.0499 -0.0644 0.5915 0.0145 -69.6206 -63.6998 -3.0559 -3.0616
0.6783 1.2750 3700 0.6864 -0.0511 -0.0659 0.5904 0.0149 -69.7728 -63.8172 -3.0542 -3.0599
0.6771 1.3094 3800 0.6864 -0.0521 -0.0672 0.5920 0.0151 -69.8981 -63.9235 -3.0522 -3.0580
0.6785 1.3439 3900 0.6862 -0.0536 -0.0690 0.5922 0.0154 -70.0814 -64.0693 -3.0499 -3.0556
0.6807 1.3784 4000 0.6861 -0.0551 -0.0708 0.5908 0.0157 -70.2593 -64.2214 -3.0484 -3.0541
0.6769 1.4128 4100 0.6860 -0.0563 -0.0722 0.5929 0.0159 -70.3988 -64.3376 -3.0467 -3.0525
0.6722 1.4473 4200 0.6859 -0.0577 -0.0738 0.5946 0.0161 -70.5629 -64.4845 -3.0456 -3.0513
0.6769 1.4817 4300 0.6858 -0.0582 -0.0745 0.5939 0.0163 -70.6349 -64.5350 -3.0442 -3.0499
0.6785 1.5162 4400 0.6858 -0.0586 -0.0750 0.5955 0.0164 -70.6776 -64.5703 -3.0432 -3.0490
0.6735 1.5507 4500 0.6858 -0.0597 -0.0762 0.5920 0.0164 -70.7972 -64.6853 -3.0421 -3.0479
0.6786 1.5851 4600 0.6857 -0.0603 -0.0769 0.5967 0.0166 -70.8698 -64.7462 -3.0414 -3.0471
0.6803 1.6196 4700 0.6857 -0.0603 -0.0770 0.5978 0.0167 -70.8781 -64.7435 -3.0408 -3.0466
0.6789 1.6540 4800 0.6856 -0.0607 -0.0775 0.5929 0.0168 -70.9263 -64.7804 -3.0399 -3.0457
0.6723 1.6885 4900 0.6856 -0.0611 -0.0779 0.5985 0.0168 -70.9741 -64.8213 -3.0390 -3.0448
0.6767 1.7229 5000 0.6856 -0.0613 -0.0781 0.5960 0.0169 -70.9925 -64.8377 -3.0388 -3.0446
0.6774 1.7574 5100 0.6856 -0.0615 -0.0784 0.5939 0.0168 -71.0176 -64.8661 -3.0387 -3.0445
0.6748 1.7919 5200 0.6855 -0.0616 -0.0786 0.5939 0.0170 -71.0377 -64.8736 -3.0383 -3.0441
0.6761 1.8263 5300 0.6855 -0.0617 -0.0787 0.5950 0.0170 -71.0469 -64.8778 -3.0380 -3.0439
0.6738 1.8608 5400 0.6855 -0.0618 -0.0788 0.5985 0.0171 -71.0633 -64.8885 -3.0380 -3.0438
0.6821 1.8952 5500 0.6855 -0.0618 -0.0788 0.5934 0.0170 -71.0638 -64.8919 -3.0379 -3.0437
0.6724 1.9297 5600 0.6855 -0.0619 -0.0788 0.5955 0.0170 -71.0635 -64.8979 -3.0379 -3.0437
0.6745 1.9642 5700 0.6855 -0.0619 -0.0790 0.5957 0.0171 -71.0788 -64.9037 -3.0380 -3.0438
0.6767 1.9986 5800 0.6856 -0.0618 -0.0788 0.5955 0.0169 -71.0584 -64.8961 -3.0381 -3.0439

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
13
Safetensors
Model size
1.1B params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_2epochs_old