Edit model card

tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6307
  • Rewards/chosen: -1.4504
  • Rewards/rejected: -1.8097
  • Rewards/accuracies: 0.6434
  • Rewards/margins: 0.3593
  • Logps/rejected: -244.1550
  • Logps/chosen: -203.7530
  • Logits/rejected: -1.7026
  • Logits/chosen: -1.7263

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.0689 400 0.6932 0.0002 0.0003 0.4654 -0.0001 -63.1542 -58.6924 -3.1574 -3.1630
0.692 0.1378 800 0.6928 0.0015 0.0008 0.5525 0.0007 -63.0955 -58.5586 -3.1518 -3.1574
0.6902 0.2068 1200 0.6914 0.0009 -0.0027 0.5876 0.0037 -63.4527 -58.6187 -3.1281 -3.1338
0.6835 0.2757 1600 0.6888 -0.0225 -0.0320 0.5864 0.0096 -66.3833 -60.9598 -3.0838 -3.0895
0.6778 0.3446 2000 0.6845 -0.0724 -0.0918 0.5976 0.0194 -72.3574 -65.9486 -3.0213 -3.0270
0.6688 0.4135 2400 0.6792 -0.1403 -0.1725 0.6032 0.0323 -80.4345 -72.7375 -2.9370 -2.9428
0.6675 0.4824 2800 0.6732 -0.2283 -0.2756 0.6057 0.0472 -90.7353 -81.5436 -2.8576 -2.8635
0.6437 0.5513 3200 0.6646 -0.3557 -0.4265 0.6120 0.0708 -105.8322 -94.2796 -2.7546 -2.7607
0.6516 0.6203 3600 0.6602 -0.4125 -0.4982 0.6178 0.0856 -112.9954 -99.9643 -2.6547 -2.6612
0.6264 0.6892 4000 0.6514 -0.5858 -0.7050 0.6315 0.1192 -133.6785 -117.2944 -2.5252 -2.5324
0.6109 0.7581 4400 0.6474 -0.6217 -0.7587 0.6313 0.1370 -139.0484 -120.8850 -2.4041 -2.4124
0.6153 0.8270 4800 0.6432 -0.7112 -0.8720 0.6266 0.1608 -150.3814 -129.8305 -2.3206 -2.3302
0.6107 0.8959 5200 0.6407 -0.7470 -0.9249 0.6350 0.1779 -155.6741 -133.4166 -2.2363 -2.2476
0.6061 0.9649 5600 0.6392 -0.7851 -0.9723 0.6315 0.1871 -160.4070 -137.2255 -2.1733 -2.1859
0.5701 1.0338 6000 0.6356 -1.0035 -1.2450 0.6292 0.2415 -187.6758 -159.0581 -2.0122 -2.0292
0.5557 1.1027 6400 0.6358 -1.0296 -1.2785 0.6322 0.2489 -191.0262 -161.6682 -1.9777 -1.9953
0.5292 1.1716 6800 0.6333 -1.0878 -1.3492 0.6313 0.2614 -198.1001 -167.4900 -1.8969 -1.9159
0.5473 1.2405 7200 0.6354 -1.0479 -1.2958 0.6262 0.2479 -192.7597 -163.5001 -1.9044 -1.9226
0.6231 1.3094 7600 0.6346 -1.2184 -1.4979 0.6289 0.2795 -212.9705 -180.5535 -1.8355 -1.8558
0.5403 1.3784 8000 0.6339 -1.1437 -1.4111 0.6264 0.2673 -204.2867 -173.0842 -1.8647 -1.8848
0.5444 1.4473 8400 0.6339 -1.0726 -1.3310 0.6287 0.2584 -196.2827 -165.9765 -1.8568 -1.8768
0.5766 1.5162 8800 0.6329 -1.0364 -1.2879 0.6336 0.2516 -191.9749 -162.3483 -1.8819 -1.9009
0.525 1.5851 9200 0.6320 -1.1870 -1.4611 0.6366 0.2740 -209.2869 -177.4161 -1.8122 -1.8325
0.5174 1.6540 9600 0.6310 -1.2662 -1.5606 0.6375 0.2944 -219.2438 -185.3348 -1.7597 -1.7810
0.5312 1.7229 10000 0.6313 -1.2979 -1.6013 0.6359 0.3033 -223.3081 -188.5056 -1.7629 -1.7848
0.4923 1.7919 10400 0.6312 -1.1596 -1.4412 0.6334 0.2815 -207.2955 -174.6746 -1.7754 -1.7966
0.5386 1.8608 10800 0.6304 -1.2706 -1.5735 0.6373 0.3029 -220.5279 -185.7685 -1.7500 -1.7722
0.5178 1.9297 11200 0.6295 -1.2859 -1.6008 0.6443 0.3149 -223.2599 -187.3036 -1.7272 -1.7501
0.5556 1.9986 11600 0.6295 -1.2652 -1.5714 0.6362 0.3062 -220.3214 -185.2294 -1.7356 -1.7580
0.4901 2.0675 12000 0.6303 -1.4749 -1.8246 0.6447 0.3497 -245.6420 -206.2009 -1.6688 -1.6928
0.4713 2.1365 12400 0.6303 -1.6230 -2.0017 0.6471 0.3786 -263.3478 -221.0147 -1.6397 -1.6644
0.5188 2.2054 12800 0.6305 -1.4593 -1.8052 0.6408 0.3458 -243.6979 -204.6454 -1.6776 -1.7011
0.5395 2.2743 13200 0.6315 -1.5373 -1.9051 0.6429 0.3678 -253.6892 -212.4377 -1.6591 -1.6834
0.5059 2.3432 13600 0.6318 -1.4799 -1.8381 0.6431 0.3582 -246.9884 -206.6992 -1.6812 -1.7051
0.4543 2.4121 14000 0.6318 -1.3717 -1.7109 0.6459 0.3392 -234.2693 -195.8793 -1.7134 -1.7366
0.5121 2.4810 14400 0.6308 -1.4206 -1.7736 0.6447 0.3530 -240.5389 -200.7700 -1.7016 -1.7252
0.4847 2.5500 14800 0.6304 -1.4817 -1.8498 0.6443 0.3681 -248.1589 -206.8796 -1.6912 -1.7153
0.4701 2.6189 15200 0.6306 -1.4145 -1.7659 0.6445 0.3514 -239.7732 -200.1665 -1.7090 -1.7324
0.5011 2.6878 15600 0.6304 -1.4080 -1.7575 0.6434 0.3495 -238.9349 -199.5119 -1.7135 -1.7369
0.4936 2.7567 16000 0.6304 -1.4490 -1.8088 0.6436 0.3598 -244.0595 -203.6143 -1.7010 -1.7248
0.4952 2.8256 16400 0.6312 -1.4483 -1.8060 0.6438 0.3577 -243.7794 -203.5389 -1.7043 -1.7279
0.5024 2.8946 16800 0.6304 -1.4492 -1.8094 0.6429 0.3602 -244.1201 -203.6308 -1.7037 -1.7274
0.5054 2.9635 17200 0.6303 -1.4484 -1.8080 0.6436 0.3596 -243.9776 -203.5508 -1.7024 -1.7262

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
1.1B params
Tensor type
F32
·

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old