--- license: apache-2.0 base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - openai/summarize_from_feedback model-index: - name: tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old results: [] --- # tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full_old](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full_old) on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set: - Loss: 0.6307 - Rewards/chosen: -1.4504 - Rewards/rejected: -1.8097 - Rewards/accuracies: 0.6434 - Rewards/margins: 0.3593 - Logps/rejected: -244.1550 - Logps/chosen: -203.7530 - Logits/rejected: -1.7026 - Logits/chosen: -1.7263 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6931 | 0.0689 | 400 | 0.6932 | 0.0002 | 0.0003 | 0.4654 | -0.0001 | -63.1542 | -58.6924 | -3.1574 | -3.1630 | | 0.692 | 0.1378 | 800 | 0.6928 | 0.0015 | 0.0008 | 0.5525 | 0.0007 | -63.0955 | -58.5586 | -3.1518 | -3.1574 | | 0.6902 | 0.2068 | 1200 | 0.6914 | 0.0009 | -0.0027 | 0.5876 | 0.0037 | -63.4527 | -58.6187 | -3.1281 | -3.1338 | | 0.6835 | 0.2757 | 1600 | 0.6888 | -0.0225 | -0.0320 | 0.5864 | 0.0096 | -66.3833 | -60.9598 | -3.0838 | -3.0895 | | 0.6778 | 0.3446 | 2000 | 0.6845 | -0.0724 | -0.0918 | 0.5976 | 0.0194 | -72.3574 | -65.9486 | -3.0213 | -3.0270 | | 0.6688 | 0.4135 | 2400 | 0.6792 | -0.1403 | -0.1725 | 0.6032 | 0.0323 | -80.4345 | -72.7375 | -2.9370 | -2.9428 | | 0.6675 | 0.4824 | 2800 | 0.6732 | -0.2283 | -0.2756 | 0.6057 | 0.0472 | -90.7353 | -81.5436 | -2.8576 | -2.8635 | | 0.6437 | 0.5513 | 3200 | 0.6646 | -0.3557 | -0.4265 | 0.6120 | 0.0708 | -105.8322 | -94.2796 | -2.7546 | -2.7607 | | 0.6516 | 0.6203 | 3600 | 0.6602 | -0.4125 | -0.4982 | 0.6178 | 0.0856 | -112.9954 | -99.9643 | -2.6547 | -2.6612 | | 0.6264 | 0.6892 | 4000 | 0.6514 | -0.5858 | -0.7050 | 0.6315 | 0.1192 | -133.6785 | -117.2944 | -2.5252 | -2.5324 | | 0.6109 | 0.7581 | 4400 | 0.6474 | -0.6217 | -0.7587 | 0.6313 | 0.1370 | -139.0484 | -120.8850 | -2.4041 | -2.4124 | | 0.6153 | 0.8270 | 4800 | 0.6432 | -0.7112 | -0.8720 | 0.6266 | 0.1608 | -150.3814 | -129.8305 | -2.3206 | -2.3302 | | 0.6107 | 0.8959 | 5200 | 0.6407 | -0.7470 | -0.9249 | 0.6350 | 0.1779 | -155.6741 | -133.4166 | -2.2363 | -2.2476 | | 0.6061 | 0.9649 | 5600 | 0.6392 | -0.7851 | -0.9723 | 0.6315 | 0.1871 | -160.4070 | -137.2255 | -2.1733 | -2.1859 | | 0.5701 | 1.0338 | 6000 | 0.6356 | -1.0035 | -1.2450 | 0.6292 | 0.2415 | -187.6758 | -159.0581 | -2.0122 | -2.0292 | | 0.5557 | 1.1027 | 6400 | 0.6358 | -1.0296 | -1.2785 | 0.6322 | 0.2489 | -191.0262 | -161.6682 | -1.9777 | -1.9953 | | 0.5292 | 1.1716 | 6800 | 0.6333 | -1.0878 | -1.3492 | 0.6313 | 0.2614 | -198.1001 | -167.4900 | -1.8969 | -1.9159 | | 0.5473 | 1.2405 | 7200 | 0.6354 | -1.0479 | -1.2958 | 0.6262 | 0.2479 | -192.7597 | -163.5001 | -1.9044 | -1.9226 | | 0.6231 | 1.3094 | 7600 | 0.6346 | -1.2184 | -1.4979 | 0.6289 | 0.2795 | -212.9705 | -180.5535 | -1.8355 | -1.8558 | | 0.5403 | 1.3784 | 8000 | 0.6339 | -1.1437 | -1.4111 | 0.6264 | 0.2673 | -204.2867 | -173.0842 | -1.8647 | -1.8848 | | 0.5444 | 1.4473 | 8400 | 0.6339 | -1.0726 | -1.3310 | 0.6287 | 0.2584 | -196.2827 | -165.9765 | -1.8568 | -1.8768 | | 0.5766 | 1.5162 | 8800 | 0.6329 | -1.0364 | -1.2879 | 0.6336 | 0.2516 | -191.9749 | -162.3483 | -1.8819 | -1.9009 | | 0.525 | 1.5851 | 9200 | 0.6320 | -1.1870 | -1.4611 | 0.6366 | 0.2740 | -209.2869 | -177.4161 | -1.8122 | -1.8325 | | 0.5174 | 1.6540 | 9600 | 0.6310 | -1.2662 | -1.5606 | 0.6375 | 0.2944 | -219.2438 | -185.3348 | -1.7597 | -1.7810 | | 0.5312 | 1.7229 | 10000 | 0.6313 | -1.2979 | -1.6013 | 0.6359 | 0.3033 | -223.3081 | -188.5056 | -1.7629 | -1.7848 | | 0.4923 | 1.7919 | 10400 | 0.6312 | -1.1596 | -1.4412 | 0.6334 | 0.2815 | -207.2955 | -174.6746 | -1.7754 | -1.7966 | | 0.5386 | 1.8608 | 10800 | 0.6304 | -1.2706 | -1.5735 | 0.6373 | 0.3029 | -220.5279 | -185.7685 | -1.7500 | -1.7722 | | 0.5178 | 1.9297 | 11200 | 0.6295 | -1.2859 | -1.6008 | 0.6443 | 0.3149 | -223.2599 | -187.3036 | -1.7272 | -1.7501 | | 0.5556 | 1.9986 | 11600 | 0.6295 | -1.2652 | -1.5714 | 0.6362 | 0.3062 | -220.3214 | -185.2294 | -1.7356 | -1.7580 | | 0.4901 | 2.0675 | 12000 | 0.6303 | -1.4749 | -1.8246 | 0.6447 | 0.3497 | -245.6420 | -206.2009 | -1.6688 | -1.6928 | | 0.4713 | 2.1365 | 12400 | 0.6303 | -1.6230 | -2.0017 | 0.6471 | 0.3786 | -263.3478 | -221.0147 | -1.6397 | -1.6644 | | 0.5188 | 2.2054 | 12800 | 0.6305 | -1.4593 | -1.8052 | 0.6408 | 0.3458 | -243.6979 | -204.6454 | -1.6776 | -1.7011 | | 0.5395 | 2.2743 | 13200 | 0.6315 | -1.5373 | -1.9051 | 0.6429 | 0.3678 | -253.6892 | -212.4377 | -1.6591 | -1.6834 | | 0.5059 | 2.3432 | 13600 | 0.6318 | -1.4799 | -1.8381 | 0.6431 | 0.3582 | -246.9884 | -206.6992 | -1.6812 | -1.7051 | | 0.4543 | 2.4121 | 14000 | 0.6318 | -1.3717 | -1.7109 | 0.6459 | 0.3392 | -234.2693 | -195.8793 | -1.7134 | -1.7366 | | 0.5121 | 2.4810 | 14400 | 0.6308 | -1.4206 | -1.7736 | 0.6447 | 0.3530 | -240.5389 | -200.7700 | -1.7016 | -1.7252 | | 0.4847 | 2.5500 | 14800 | 0.6304 | -1.4817 | -1.8498 | 0.6443 | 0.3681 | -248.1589 | -206.8796 | -1.6912 | -1.7153 | | 0.4701 | 2.6189 | 15200 | 0.6306 | -1.4145 | -1.7659 | 0.6445 | 0.3514 | -239.7732 | -200.1665 | -1.7090 | -1.7324 | | 0.5011 | 2.6878 | 15600 | 0.6304 | -1.4080 | -1.7575 | 0.6434 | 0.3495 | -238.9349 | -199.5119 | -1.7135 | -1.7369 | | 0.4936 | 2.7567 | 16000 | 0.6304 | -1.4490 | -1.8088 | 0.6436 | 0.3598 | -244.0595 | -203.6143 | -1.7010 | -1.7248 | | 0.4952 | 2.8256 | 16400 | 0.6312 | -1.4483 | -1.8060 | 0.6438 | 0.3577 | -243.7794 | -203.5389 | -1.7043 | -1.7279 | | 0.5024 | 2.8946 | 16800 | 0.6304 | -1.4492 | -1.8094 | 0.6429 | 0.3602 | -244.1201 | -203.6308 | -1.7037 | -1.7274 | | 0.5054 | 2.9635 | 17200 | 0.6303 | -1.4484 | -1.8080 | 0.6436 | 0.3596 | -243.9776 | -203.5508 | -1.7024 | -1.7262 | ### Framework versions - Transformers 4.41.2 - Pytorch 2.1.2 - Datasets 2.19.2 - Tokenizers 0.19.1