llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
fed5015 verified
---
license: llama2
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
- name: llama_DPO_model_e2
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# llama_DPO_model_e2
This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1045
- Rewards/chosen: 0.4197
- Rewards/rejected: -1.9316
- Rewards/accuracies: 1.0
- Rewards/margins: 2.3513
- Logps/rejected: -204.1257
- Logps/chosen: -156.4368
- Logits/rejected: -1.0515
- Logits/chosen: -0.8584
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7.5e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6732 | 0.1 | 25 | 0.6518 | 0.0274 | -0.0584 | 0.8867 | 0.0858 | -185.3935 | -160.3602 | -1.0521 | -0.8541 |
| 0.588 | 0.2 | 50 | 0.5616 | 0.0780 | -0.2093 | 0.9933 | 0.2873 | -186.9026 | -159.8541 | -1.0523 | -0.8550 |
| 0.5077 | 0.3 | 75 | 0.4690 | 0.1360 | -0.3896 | 1.0 | 0.5256 | -188.7056 | -159.2737 | -1.0525 | -0.8564 |
| 0.4179 | 0.4 | 100 | 0.3872 | 0.1873 | -0.5861 | 1.0 | 0.7734 | -190.6710 | -158.7608 | -1.0532 | -0.8563 |
| 0.3614 | 0.5 | 125 | 0.3170 | 0.2381 | -0.7895 | 1.0 | 1.0276 | -192.7043 | -158.2528 | -1.0533 | -0.8568 |
| 0.2812 | 0.6 | 150 | 0.2544 | 0.2856 | -1.0121 | 1.0 | 1.2977 | -194.9309 | -157.7783 | -1.0527 | -0.8569 |
| 0.2378 | 0.7 | 175 | 0.2066 | 0.3262 | -1.2240 | 1.0 | 1.5502 | -197.0494 | -157.3717 | -1.0520 | -0.8573 |
| 0.1866 | 0.79 | 200 | 0.1704 | 0.3591 | -1.4222 | 1.0 | 1.7812 | -199.0312 | -157.0431 | -1.0526 | -0.8577 |
| 0.1555 | 0.89 | 225 | 0.1429 | 0.3829 | -1.6050 | 1.0 | 1.9879 | -200.8594 | -156.8051 | -1.0523 | -0.8580 |
| 0.1312 | 0.99 | 250 | 0.1239 | 0.4002 | -1.7534 | 1.0 | 2.1536 | -202.3439 | -156.6322 | -1.0515 | -0.8572 |
| 0.1276 | 1.09 | 275 | 0.1147 | 0.4086 | -1.8325 | 1.0 | 2.2410 | -203.1341 | -156.5480 | -1.0518 | -0.8578 |
| 0.1038 | 1.19 | 300 | 0.1094 | 0.4144 | -1.8779 | 1.0 | 2.2923 | -203.5883 | -156.4901 | -1.0511 | -0.8574 |
| 0.101 | 1.29 | 325 | 0.1072 | 0.4191 | -1.9023 | 1.0 | 2.3214 | -203.8326 | -156.4429 | -1.0512 | -0.8569 |
| 0.1128 | 1.39 | 350 | 0.1056 | 0.4189 | -1.9206 | 1.0 | 2.3394 | -204.0154 | -156.4454 | -1.0511 | -0.8576 |
| 0.11 | 1.49 | 375 | 0.1047 | 0.4220 | -1.9262 | 1.0 | 2.3482 | -204.0712 | -156.4135 | -1.0509 | -0.8570 |
| 0.1001 | 1.59 | 400 | 0.1048 | 0.4224 | -1.9281 | 1.0 | 2.3505 | -204.0909 | -156.4098 | -1.0514 | -0.8574 |
| 0.0978 | 1.69 | 425 | 0.1042 | 0.4246 | -1.9292 | 1.0 | 2.3538 | -204.1014 | -156.3875 | -1.0512 | -0.8573 |
| 0.1111 | 1.79 | 450 | 0.1041 | 0.4244 | -1.9292 | 1.0 | 2.3536 | -204.1017 | -156.3903 | -1.0514 | -0.8587 |
| 0.1064 | 1.89 | 475 | 0.1044 | 0.4199 | -1.9317 | 1.0 | 2.3516 | -204.1266 | -156.4352 | -1.0514 | -0.8577 |
| 0.107 | 1.99 | 500 | 0.1045 | 0.4197 | -1.9316 | 1.0 | 2.3513 | -204.1257 | -156.4368 | -1.0515 | -0.8584 |
### Framework versions
- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2