Edit model card

tinyllama-1.1b-sum-dpo-full_LR5e-7_3epochs

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7099
  • Rewards/chosen: -2.8601
  • Rewards/rejected: -3.4154
  • Rewards/accuracies: 0.6320
  • Rewards/margins: 0.5553
  • Logps/rejected: -404.2897
  • Logps/chosen: -345.0273
  • Logits/rejected: -1.9822
  • Logits/chosen: -2.0068

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.689 0.0689 400 0.6921 0.0010 -0.0011 0.5616 0.0021 -62.8638 -58.9160 -2.9633 -2.9669
0.6822 0.1378 800 0.6861 -0.0503 -0.0663 0.5746 0.0160 -69.3792 -64.0464 -2.9255 -2.9291
0.6737 0.2068 1200 0.6780 -0.2790 -0.3169 0.5762 0.0379 -94.4367 -86.9165 -2.8527 -2.8562
0.6648 0.2757 1600 0.6677 -0.4500 -0.5183 0.6029 0.0683 -114.5829 -104.0142 -2.7578 -2.7612
0.6678 0.3446 2000 0.6576 -0.7094 -0.8175 0.6217 0.1081 -144.4979 -129.9582 -2.6611 -2.6651
0.6253 0.4135 2400 0.6468 -1.0987 -1.2558 0.6236 0.1571 -188.3249 -168.8844 -2.4966 -2.5038
0.6616 0.4824 2800 0.6473 -0.7839 -0.9244 0.6303 0.1405 -155.1877 -137.4051 -2.4668 -2.4737
0.6282 0.5513 3200 0.6395 -1.3763 -1.5943 0.6331 0.2181 -222.1840 -196.6437 -2.2441 -2.2573
0.5886 0.6203 3600 0.6382 -1.2763 -1.4872 0.6355 0.2109 -211.4734 -186.6474 -2.1487 -2.1634
0.5903 0.6892 4000 0.6398 -1.0104 -1.2131 0.6366 0.2027 -184.0546 -160.0534 -2.1888 -2.2035
0.5886 0.7581 4400 0.6349 -1.2844 -1.5732 0.6341 0.2888 -220.0676 -187.4508 -2.0898 -2.1111
0.5907 0.8270 4800 0.6306 -1.3443 -1.6135 0.6478 0.2692 -224.0959 -193.4449 -2.0942 -2.1137
0.5456 0.8959 5200 0.6327 -1.1753 -1.4199 0.6408 0.2446 -204.7423 -176.5441 -2.1214 -2.1394
0.5465 0.9649 5600 0.6325 -1.2769 -1.5500 0.6371 0.2731 -217.7467 -186.7071 -2.0669 -2.0872
0.4632 1.0338 6000 0.6484 -2.1822 -2.6404 0.6496 0.4582 -326.7876 -277.2339 -1.8836 -1.9125
0.4736 1.1027 6400 0.6454 -2.1568 -2.5961 0.6547 0.4393 -322.3579 -274.6943 -1.8531 -1.8794
0.4665 1.1716 6800 0.6386 -1.8958 -2.2728 0.6443 0.3770 -290.0295 -248.5992 -1.8821 -1.9042
0.4789 1.2405 7200 0.6483 -1.9198 -2.2931 0.6403 0.3733 -292.0611 -250.9941 -1.9443 -1.9659
0.5477 1.3094 7600 0.6413 -1.7843 -2.1677 0.6499 0.3834 -279.5165 -237.4425 -1.9622 -1.9845
0.4423 1.3784 8000 0.6528 -2.0003 -2.3620 0.6415 0.3617 -298.9479 -259.0417 -1.9266 -1.9469
0.4668 1.4473 8400 0.6515 -1.8405 -2.1818 0.6403 0.3413 -280.9325 -243.0684 -1.9825 -2.0027
0.509 1.5162 8800 0.6471 -1.9547 -2.3166 0.6424 0.3619 -294.4091 -254.4828 -2.0224 -2.0422
0.4177 1.5851 9200 0.6542 -1.9336 -2.3034 0.6392 0.3699 -293.0923 -252.3707 -1.9854 -2.0064
0.4181 1.6540 9600 0.6626 -2.3352 -2.8057 0.6438 0.4706 -343.3230 -292.5314 -1.9265 -1.9501
0.4469 1.7229 10000 0.6436 -1.8037 -2.1726 0.6431 0.3689 -280.0089 -239.3807 -2.0388 -2.0591
0.4365 1.7919 10400 0.6446 -1.7691 -2.1263 0.6466 0.3572 -275.3837 -235.9303 -2.0443 -2.0637
0.4488 1.8608 10800 0.6558 -2.1203 -2.5393 0.6450 0.4190 -316.6843 -271.0489 -2.0317 -2.0535
0.4611 1.9297 11200 0.6646 -2.4708 -2.9416 0.6468 0.4708 -356.9083 -306.0948 -1.9987 -2.0224
0.4546 1.9986 11600 0.6541 -2.2751 -2.7321 0.6436 0.4570 -335.9583 -286.5284 -1.9967 -2.0195
0.3836 2.0675 12000 0.6827 -2.7558 -3.3214 0.6464 0.5655 -394.8881 -334.6001 -1.9585 -1.9844
0.337 2.1365 12400 0.7083 -3.2136 -3.8269 0.6424 0.6132 -445.4347 -380.3789 -1.9217 -1.9480
0.3756 2.2054 12800 0.6892 -2.5637 -3.0760 0.6378 0.5123 -370.3519 -315.3893 -1.9938 -2.0171
0.4071 2.2743 13200 0.6989 -2.7240 -3.2763 0.6345 0.5523 -390.3795 -331.4143 -1.9810 -2.0059
0.4236 2.3432 13600 0.7127 -2.9174 -3.4982 0.6329 0.5808 -412.5668 -350.7576 -1.9542 -1.9798
0.3527 2.4121 14000 0.7006 -2.6980 -3.2475 0.6252 0.5496 -387.5038 -328.8109 -1.9852 -2.0098
0.3258 2.4810 14400 0.7095 -2.9212 -3.5009 0.6292 0.5798 -412.8438 -351.1316 -1.9581 -1.9835
0.3646 2.5500 14800 0.7041 -2.7281 -3.2711 0.6350 0.5430 -389.8630 -331.8257 -1.9884 -2.0127
0.3596 2.6189 15200 0.7046 -2.7894 -3.3372 0.6359 0.5478 -396.4674 -337.9509 -1.9862 -2.0104
0.3549 2.6878 15600 0.7067 -2.8436 -3.3930 0.6310 0.5494 -402.0518 -343.3737 -1.9841 -2.0084
0.2868 2.7567 16000 0.7117 -2.9064 -3.4673 0.6289 0.5609 -409.4747 -349.6523 -1.9770 -2.0016
0.3243 2.8256 16400 0.7086 -2.8350 -3.3883 0.6320 0.5533 -401.5786 -342.5143 -1.9841 -2.0085
0.3963 2.8946 16800 0.7104 -2.8648 -3.4205 0.6301 0.5558 -404.8014 -345.4919 -1.9835 -2.0081
0.3399 2.9635 17200 0.7095 -2.8594 -3.4153 0.6336 0.5559 -404.2798 -344.9560 -1.9830 -2.0075

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
1.1B params
Tensor type
F32
·

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-7_3epochs