--- license: llama2 library_name: peft tags: - trl - dpo - generated_from_trainer base_model: meta-llama/Llama-2-7b-hf model-index: - name: llama_DPO_model_e2 results: [] --- # llama_DPO_model_e2 This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.1205 - Rewards/chosen: 0.4005 - Rewards/rejected: -1.7841 - Rewards/accuracies: 1.0 - Rewards/margins: 2.1847 - Logps/rejected: -202.6509 - Logps/chosen: -156.6288 - Logits/rejected: -1.0515 - Logits/chosen: -0.8581 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 7e-07 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 2 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6753 | 0.1 | 25 | 0.6561 | 0.0241 | -0.0529 | 0.8800 | 0.0770 | -185.3385 | -160.3932 | -1.0518 | -0.8547 | | 0.596 | 0.2 | 50 | 0.5763 | 0.0663 | -0.1863 | 0.9933 | 0.2525 | -186.6722 | -159.9714 | -1.0527 | -0.8563 | | 0.5265 | 0.3 | 75 | 0.4888 | 0.1230 | -0.3480 | 1.0 | 0.4710 | -188.2895 | -159.4043 | -1.0529 | -0.8557 | | 0.4405 | 0.4 | 100 | 0.4115 | 0.1711 | -0.5248 | 1.0 | 0.6959 | -190.0574 | -158.9227 | -1.0521 | -0.8557 | | 0.3832 | 0.5 | 125 | 0.3418 | 0.2187 | -0.7108 | 1.0 | 0.9295 | -191.9176 | -158.4473 | -1.0530 | -0.8571 | | 0.3071 | 0.6 | 150 | 0.2809 | 0.2614 | -0.9143 | 1.0 | 1.1757 | -193.9524 | -158.0195 | -1.0526 | -0.8568 | | 0.2635 | 0.7 | 175 | 0.2300 | 0.3051 | -1.1158 | 1.0 | 1.4209 | -195.9679 | -157.5830 | -1.0531 | -0.8575 | | 0.2056 | 0.79 | 200 | 0.1912 | 0.3381 | -1.3041 | 1.0 | 1.6422 | -197.8509 | -157.2532 | -1.0529 | -0.8577 | | 0.1735 | 0.89 | 225 | 0.1617 | 0.3637 | -1.4760 | 1.0 | 1.8397 | -199.5699 | -156.9968 | -1.0524 | -0.8580 | | 0.1492 | 0.99 | 250 | 0.1416 | 0.3797 | -1.6179 | 1.0 | 1.9976 | -200.9889 | -156.8374 | -1.0521 | -0.8575 | | 0.144 | 1.09 | 275 | 0.1304 | 0.3918 | -1.6997 | 1.0 | 2.0915 | -201.8062 | -156.7157 | -1.0517 | -0.8590 | | 0.1203 | 1.19 | 300 | 0.1255 | 0.3955 | -1.7398 | 1.0 | 2.1353 | -202.2080 | -156.6790 | -1.0514 | -0.8580 | | 0.117 | 1.29 | 325 | 0.1229 | 0.3961 | -1.7635 | 1.0 | 2.1596 | -202.4451 | -156.6730 | -1.0514 | -0.8572 | | 0.1286 | 1.39 | 350 | 0.1209 | 0.4018 | -1.7766 | 1.0 | 2.1784 | -202.5752 | -156.6156 | -1.0517 | -0.8587 | | 0.126 | 1.49 | 375 | 0.1199 | 0.4025 | -1.7866 | 1.0 | 2.1891 | -202.6759 | -156.6091 | -1.0517 | -0.8587 | | 0.1154 | 1.59 | 400 | 0.1202 | 0.4013 | -1.7865 | 1.0 | 2.1877 | -202.6743 | -156.6213 | -1.0514 | -0.8580 | | 0.1141 | 1.69 | 425 | 0.1200 | 0.3990 | -1.7907 | 1.0 | 2.1897 | -202.7168 | -156.6437 | -1.0518 | -0.8578 | | 0.1284 | 1.79 | 450 | 0.1196 | 0.4012 | -1.7899 | 1.0 | 2.1910 | -202.7081 | -156.6221 | -1.0518 | -0.8582 | | 0.1225 | 1.89 | 475 | 0.1205 | 0.3984 | -1.7858 | 1.0 | 2.1842 | -202.6674 | -156.6495 | -1.0517 | -0.8592 | | 0.1224 | 1.99 | 500 | 0.1205 | 0.4005 | -1.7841 | 1.0 | 2.1847 | -202.6509 | -156.6288 | -1.0515 | -0.8581 | ### Framework versions - PEFT 0.8.2 - Transformers 4.38.1 - Pytorch 2.2.0+cu118 - Datasets 2.17.1 - Tokenizers 0.15.2