llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
56455bc verified
|
raw
history blame
No virus
5.69 kB
---
license: llama2
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
- name: llama_DPO_model_e2
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# llama_DPO_model_e2
This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1205
- Rewards/chosen: 0.4005
- Rewards/rejected: -1.7841
- Rewards/accuracies: 1.0
- Rewards/margins: 2.1847
- Logps/rejected: -202.6509
- Logps/chosen: -156.6288
- Logits/rejected: -1.0515
- Logits/chosen: -0.8581
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6753 | 0.1 | 25 | 0.6561 | 0.0241 | -0.0529 | 0.8800 | 0.0770 | -185.3385 | -160.3932 | -1.0518 | -0.8547 |
| 0.596 | 0.2 | 50 | 0.5763 | 0.0663 | -0.1863 | 0.9933 | 0.2525 | -186.6722 | -159.9714 | -1.0527 | -0.8563 |
| 0.5265 | 0.3 | 75 | 0.4888 | 0.1230 | -0.3480 | 1.0 | 0.4710 | -188.2895 | -159.4043 | -1.0529 | -0.8557 |
| 0.4405 | 0.4 | 100 | 0.4115 | 0.1711 | -0.5248 | 1.0 | 0.6959 | -190.0574 | -158.9227 | -1.0521 | -0.8557 |
| 0.3832 | 0.5 | 125 | 0.3418 | 0.2187 | -0.7108 | 1.0 | 0.9295 | -191.9176 | -158.4473 | -1.0530 | -0.8571 |
| 0.3071 | 0.6 | 150 | 0.2809 | 0.2614 | -0.9143 | 1.0 | 1.1757 | -193.9524 | -158.0195 | -1.0526 | -0.8568 |
| 0.2635 | 0.7 | 175 | 0.2300 | 0.3051 | -1.1158 | 1.0 | 1.4209 | -195.9679 | -157.5830 | -1.0531 | -0.8575 |
| 0.2056 | 0.79 | 200 | 0.1912 | 0.3381 | -1.3041 | 1.0 | 1.6422 | -197.8509 | -157.2532 | -1.0529 | -0.8577 |
| 0.1735 | 0.89 | 225 | 0.1617 | 0.3637 | -1.4760 | 1.0 | 1.8397 | -199.5699 | -156.9968 | -1.0524 | -0.8580 |
| 0.1492 | 0.99 | 250 | 0.1416 | 0.3797 | -1.6179 | 1.0 | 1.9976 | -200.9889 | -156.8374 | -1.0521 | -0.8575 |
| 0.144 | 1.09 | 275 | 0.1304 | 0.3918 | -1.6997 | 1.0 | 2.0915 | -201.8062 | -156.7157 | -1.0517 | -0.8590 |
| 0.1203 | 1.19 | 300 | 0.1255 | 0.3955 | -1.7398 | 1.0 | 2.1353 | -202.2080 | -156.6790 | -1.0514 | -0.8580 |
| 0.117 | 1.29 | 325 | 0.1229 | 0.3961 | -1.7635 | 1.0 | 2.1596 | -202.4451 | -156.6730 | -1.0514 | -0.8572 |
| 0.1286 | 1.39 | 350 | 0.1209 | 0.4018 | -1.7766 | 1.0 | 2.1784 | -202.5752 | -156.6156 | -1.0517 | -0.8587 |
| 0.126 | 1.49 | 375 | 0.1199 | 0.4025 | -1.7866 | 1.0 | 2.1891 | -202.6759 | -156.6091 | -1.0517 | -0.8587 |
| 0.1154 | 1.59 | 400 | 0.1202 | 0.4013 | -1.7865 | 1.0 | 2.1877 | -202.6743 | -156.6213 | -1.0514 | -0.8580 |
| 0.1141 | 1.69 | 425 | 0.1200 | 0.3990 | -1.7907 | 1.0 | 2.1897 | -202.7168 | -156.6437 | -1.0518 | -0.8578 |
| 0.1284 | 1.79 | 450 | 0.1196 | 0.4012 | -1.7899 | 1.0 | 2.1910 | -202.7081 | -156.6221 | -1.0518 | -0.8582 |
| 0.1225 | 1.89 | 475 | 0.1205 | 0.3984 | -1.7858 | 1.0 | 2.1842 | -202.6674 | -156.6495 | -1.0517 | -0.8592 |
| 0.1224 | 1.99 | 500 | 0.1205 | 0.4005 | -1.7841 | 1.0 | 2.1847 | -202.6509 | -156.6288 | -1.0515 | -0.8581 |
### Framework versions
- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2