--- license: llama2 library_name: peft tags: - trl - dpo - generated_from_trainer base_model: meta-llama/Llama-2-7b-hf model-index: - name: llama_DPO_model_e2 results: [] --- # llama_DPO_model_e2 This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.1045 - Rewards/chosen: 0.4197 - Rewards/rejected: -1.9316 - Rewards/accuracies: 1.0 - Rewards/margins: 2.3513 - Logps/rejected: -204.1257 - Logps/chosen: -156.4368 - Logits/rejected: -1.0515 - Logits/chosen: -0.8584 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 7.5e-07 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 2 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6732 | 0.1 | 25 | 0.6518 | 0.0274 | -0.0584 | 0.8867 | 0.0858 | -185.3935 | -160.3602 | -1.0521 | -0.8541 | | 0.588 | 0.2 | 50 | 0.5616 | 0.0780 | -0.2093 | 0.9933 | 0.2873 | -186.9026 | -159.8541 | -1.0523 | -0.8550 | | 0.5077 | 0.3 | 75 | 0.4690 | 0.1360 | -0.3896 | 1.0 | 0.5256 | -188.7056 | -159.2737 | -1.0525 | -0.8564 | | 0.4179 | 0.4 | 100 | 0.3872 | 0.1873 | -0.5861 | 1.0 | 0.7734 | -190.6710 | -158.7608 | -1.0532 | -0.8563 | | 0.3614 | 0.5 | 125 | 0.3170 | 0.2381 | -0.7895 | 1.0 | 1.0276 | -192.7043 | -158.2528 | -1.0533 | -0.8568 | | 0.2812 | 0.6 | 150 | 0.2544 | 0.2856 | -1.0121 | 1.0 | 1.2977 | -194.9309 | -157.7783 | -1.0527 | -0.8569 | | 0.2378 | 0.7 | 175 | 0.2066 | 0.3262 | -1.2240 | 1.0 | 1.5502 | -197.0494 | -157.3717 | -1.0520 | -0.8573 | | 0.1866 | 0.79 | 200 | 0.1704 | 0.3591 | -1.4222 | 1.0 | 1.7812 | -199.0312 | -157.0431 | -1.0526 | -0.8577 | | 0.1555 | 0.89 | 225 | 0.1429 | 0.3829 | -1.6050 | 1.0 | 1.9879 | -200.8594 | -156.8051 | -1.0523 | -0.8580 | | 0.1312 | 0.99 | 250 | 0.1239 | 0.4002 | -1.7534 | 1.0 | 2.1536 | -202.3439 | -156.6322 | -1.0515 | -0.8572 | | 0.1276 | 1.09 | 275 | 0.1147 | 0.4086 | -1.8325 | 1.0 | 2.2410 | -203.1341 | -156.5480 | -1.0518 | -0.8578 | | 0.1038 | 1.19 | 300 | 0.1094 | 0.4144 | -1.8779 | 1.0 | 2.2923 | -203.5883 | -156.4901 | -1.0511 | -0.8574 | | 0.101 | 1.29 | 325 | 0.1072 | 0.4191 | -1.9023 | 1.0 | 2.3214 | -203.8326 | -156.4429 | -1.0512 | -0.8569 | | 0.1128 | 1.39 | 350 | 0.1056 | 0.4189 | -1.9206 | 1.0 | 2.3394 | -204.0154 | -156.4454 | -1.0511 | -0.8576 | | 0.11 | 1.49 | 375 | 0.1047 | 0.4220 | -1.9262 | 1.0 | 2.3482 | -204.0712 | -156.4135 | -1.0509 | -0.8570 | | 0.1001 | 1.59 | 400 | 0.1048 | 0.4224 | -1.9281 | 1.0 | 2.3505 | -204.0909 | -156.4098 | -1.0514 | -0.8574 | | 0.0978 | 1.69 | 425 | 0.1042 | 0.4246 | -1.9292 | 1.0 | 2.3538 | -204.1014 | -156.3875 | -1.0512 | -0.8573 | | 0.1111 | 1.79 | 450 | 0.1041 | 0.4244 | -1.9292 | 1.0 | 2.3536 | -204.1017 | -156.3903 | -1.0514 | -0.8587 | | 0.1064 | 1.89 | 475 | 0.1044 | 0.4199 | -1.9317 | 1.0 | 2.3516 | -204.1266 | -156.4352 | -1.0514 | -0.8577 | | 0.107 | 1.99 | 500 | 0.1045 | 0.4197 | -1.9316 | 1.0 | 2.3513 | -204.1257 | -156.4368 | -1.0515 | -0.8584 | ### Framework versions - PEFT 0.8.2 - Transformers 4.38.1 - Pytorch 2.2.0+cu118 - Datasets 2.17.1 - Tokenizers 0.15.2