--- license: apache-2.0 library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: tinyllama-1.1b-chat-dpo-qlora results: [] --- # tinyllama-1.1b-chat-dpo-qlora This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-chat-sft-qlora](https://huggingface.co/martimfasantos/tinyllama-1.1b-chat-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.6084 - Rewards/chosen: -1.0875 - Rewards/rejected: -1.3916 - Rewards/accuracies: 0.6580 - Rewards/margins: 0.3041 - Logps/rejected: -490.8393 - Logps/chosen: -504.9714 - Logits/rejected: -2.6096 - Logits/chosen: -2.6425 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6921 | 0.03 | 100 | 0.6923 | 0.0160 | 0.0142 | 0.5645 | 0.0018 | -350.2683 | -394.6286 | -2.7841 | -2.8363 | | 0.6894 | 0.05 | 200 | 0.6894 | 0.0433 | 0.0353 | 0.5920 | 0.0080 | -348.1495 | -391.8949 | -2.7811 | -2.8333 | | 0.6815 | 0.08 | 300 | 0.6844 | 0.0806 | 0.0609 | 0.6025 | 0.0196 | -345.5898 | -388.1692 | -2.7838 | -2.8349 | | 0.6869 | 0.1 | 400 | 0.6788 | 0.0607 | 0.0269 | 0.6125 | 0.0339 | -348.9979 | -390.1522 | -2.7931 | -2.8423 | | 0.6744 | 0.13 | 500 | 0.6724 | 0.0243 | -0.0249 | 0.6210 | 0.0492 | -354.1764 | -393.7983 | -2.7889 | -2.8371 | | 0.6679 | 0.16 | 600 | 0.6625 | -0.0566 | -0.1346 | 0.6265 | 0.0780 | -365.1402 | -401.8826 | -2.7709 | -2.8179 | | 0.637 | 0.18 | 700 | 0.6555 | -0.2568 | -0.3654 | 0.6290 | 0.1086 | -388.2211 | -421.9038 | -2.7596 | -2.8051 | | 0.6166 | 0.21 | 800 | 0.6488 | -0.3935 | -0.5223 | 0.6320 | 0.1288 | -403.9116 | -435.5756 | -2.7523 | -2.7961 | | 0.6335 | 0.24 | 900 | 0.6458 | -0.4516 | -0.6042 | 0.6380 | 0.1527 | -412.1083 | -441.3798 | -2.7325 | -2.7764 | | 0.6286 | 0.26 | 1000 | 0.6406 | -0.8692 | -1.0442 | 0.625 | 0.1750 | -456.1026 | -483.1429 | -2.7123 | -2.7531 | | 0.669 | 0.29 | 1100 | 0.6406 | -0.3445 | -0.4984 | 0.6365 | 0.1538 | -401.5222 | -430.6789 | -2.6946 | -2.7354 | | 0.6723 | 0.31 | 1200 | 0.6358 | -0.4619 | -0.6430 | 0.6425 | 0.1811 | -415.9841 | -442.4163 | -2.6701 | -2.7077 | | 0.605 | 0.34 | 1300 | 0.6297 | -0.6894 | -0.8903 | 0.6435 | 0.2009 | -440.7144 | -465.1627 | -2.6764 | -2.7122 | | 0.6361 | 0.37 | 1400 | 0.6267 | -0.7144 | -0.9307 | 0.6505 | 0.2163 | -444.7496 | -467.6648 | -2.6711 | -2.7091 | | 0.6085 | 0.39 | 1500 | 0.6213 | -1.0532 | -1.3084 | 0.6490 | 0.2552 | -482.5256 | -501.5469 | -2.6435 | -2.6797 | | 0.6317 | 0.42 | 1600 | 0.6197 | -1.1246 | -1.3825 | 0.6490 | 0.2579 | -489.9323 | -508.6858 | -2.6172 | -2.6506 | | 0.6702 | 0.44 | 1700 | 0.6182 | -1.0036 | -1.2644 | 0.6530 | 0.2609 | -478.1268 | -496.5815 | -2.6407 | -2.6762 | | 0.5658 | 0.47 | 1800 | 0.6219 | -1.3479 | -1.6348 | 0.6445 | 0.2869 | -515.1606 | -531.0145 | -2.5866 | -2.6182 | | 0.6039 | 0.5 | 1900 | 0.6154 | -0.9014 | -1.1716 | 0.6630 | 0.2702 | -468.8458 | -486.3656 | -2.6376 | -2.6742 | | 0.6173 | 0.52 | 2000 | 0.6121 | -1.1535 | -1.4470 | 0.6575 | 0.2934 | -496.3810 | -511.5793 | -2.6232 | -2.6580 | | 0.62 | 0.55 | 2100 | 0.6116 | -1.1600 | -1.4523 | 0.6650 | 0.2923 | -496.9117 | -512.2247 | -2.6278 | -2.6629 | | 0.5957 | 0.58 | 2200 | 0.6132 | -0.9592 | -1.2431 | 0.6655 | 0.2839 | -475.9958 | -492.1489 | -2.6317 | -2.6674 | | 0.6093 | 0.6 | 2300 | 0.6138 | -1.0935 | -1.3811 | 0.6625 | 0.2876 | -489.7906 | -505.5738 | -2.6283 | -2.6619 | | 0.6009 | 0.63 | 2400 | 0.6108 | -1.0519 | -1.3479 | 0.6610 | 0.2959 | -486.4695 | -501.4175 | -2.6088 | -2.6432 | | 0.5988 | 0.65 | 2500 | 0.6108 | -1.0427 | -1.3419 | 0.6590 | 0.2992 | -485.8730 | -500.4982 | -2.6143 | -2.6477 | | 0.606 | 0.68 | 2600 | 0.6112 | -1.0188 | -1.3192 | 0.6545 | 0.3003 | -483.6013 | -498.1078 | -2.5974 | -2.6304 | | 0.6118 | 0.71 | 2700 | 0.6106 | -1.0808 | -1.3857 | 0.6595 | 0.3049 | -490.2562 | -504.3045 | -2.5945 | -2.6274 | | 0.6134 | 0.73 | 2800 | 0.6096 | -1.1549 | -1.4635 | 0.6585 | 0.3086 | -498.0366 | -511.7179 | -2.5978 | -2.6303 | | 0.6159 | 0.76 | 2900 | 0.6097 | -1.0550 | -1.3509 | 0.6585 | 0.2959 | -486.7739 | -501.7256 | -2.6175 | -2.6500 | | 0.5815 | 0.79 | 3000 | 0.6091 | -1.1025 | -1.4048 | 0.6570 | 0.3023 | -492.1650 | -506.4727 | -2.6089 | -2.6420 | | 0.5885 | 0.81 | 3100 | 0.6089 | -1.0977 | -1.4006 | 0.6595 | 0.3029 | -491.7444 | -505.9960 | -2.6001 | -2.6337 | | 0.6074 | 0.84 | 3200 | 0.6086 | -1.0982 | -1.4029 | 0.6605 | 0.3047 | -491.9724 | -506.0455 | -2.6056 | -2.6388 | | 0.5981 | 0.86 | 3300 | 0.6087 | -1.0853 | -1.3881 | 0.6610 | 0.3028 | -490.4915 | -504.7571 | -2.6117 | -2.6442 | | 0.5944 | 0.89 | 3400 | 0.6087 | -1.0897 | -1.3931 | 0.6580 | 0.3034 | -490.9887 | -505.1947 | -2.6026 | -2.6360 | | 0.5979 | 0.92 | 3500 | 0.6085 | -1.0922 | -1.3962 | 0.6595 | 0.3040 | -491.3070 | -505.4438 | -2.6136 | -2.6460 | | 0.6154 | 0.94 | 3600 | 0.6086 | -1.0905 | -1.3946 | 0.6595 | 0.3040 | -491.1413 | -505.2781 | -2.6066 | -2.6397 | | 0.6053 | 0.97 | 3700 | 0.6086 | -1.0907 | -1.3946 | 0.6550 | 0.3039 | -491.1405 | -505.2943 | -2.6094 | -2.6423 | | 0.602 | 0.99 | 3800 | 0.6085 | -1.0876 | -1.3914 | 0.6580 | 0.3038 | -490.8211 | -504.9807 | -2.6096 | -2.6425 | ### Framework versions - PEFT 0.7.1 - Transformers 4.39.3 - Pytorch 2.1.2 - Datasets 2.18.0 - Tokenizers 0.15.2