llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
56455bc verified
|
raw
history blame
No virus
5.69 kB
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1205
  • Rewards/chosen: 0.4005
  • Rewards/rejected: -1.7841
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.1847
  • Logps/rejected: -202.6509
  • Logps/chosen: -156.6288
  • Logits/rejected: -1.0515
  • Logits/chosen: -0.8581

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 7e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6753 0.1 25 0.6561 0.0241 -0.0529 0.8800 0.0770 -185.3385 -160.3932 -1.0518 -0.8547
0.596 0.2 50 0.5763 0.0663 -0.1863 0.9933 0.2525 -186.6722 -159.9714 -1.0527 -0.8563
0.5265 0.3 75 0.4888 0.1230 -0.3480 1.0 0.4710 -188.2895 -159.4043 -1.0529 -0.8557
0.4405 0.4 100 0.4115 0.1711 -0.5248 1.0 0.6959 -190.0574 -158.9227 -1.0521 -0.8557
0.3832 0.5 125 0.3418 0.2187 -0.7108 1.0 0.9295 -191.9176 -158.4473 -1.0530 -0.8571
0.3071 0.6 150 0.2809 0.2614 -0.9143 1.0 1.1757 -193.9524 -158.0195 -1.0526 -0.8568
0.2635 0.7 175 0.2300 0.3051 -1.1158 1.0 1.4209 -195.9679 -157.5830 -1.0531 -0.8575
0.2056 0.79 200 0.1912 0.3381 -1.3041 1.0 1.6422 -197.8509 -157.2532 -1.0529 -0.8577
0.1735 0.89 225 0.1617 0.3637 -1.4760 1.0 1.8397 -199.5699 -156.9968 -1.0524 -0.8580
0.1492 0.99 250 0.1416 0.3797 -1.6179 1.0 1.9976 -200.9889 -156.8374 -1.0521 -0.8575
0.144 1.09 275 0.1304 0.3918 -1.6997 1.0 2.0915 -201.8062 -156.7157 -1.0517 -0.8590
0.1203 1.19 300 0.1255 0.3955 -1.7398 1.0 2.1353 -202.2080 -156.6790 -1.0514 -0.8580
0.117 1.29 325 0.1229 0.3961 -1.7635 1.0 2.1596 -202.4451 -156.6730 -1.0514 -0.8572
0.1286 1.39 350 0.1209 0.4018 -1.7766 1.0 2.1784 -202.5752 -156.6156 -1.0517 -0.8587
0.126 1.49 375 0.1199 0.4025 -1.7866 1.0 2.1891 -202.6759 -156.6091 -1.0517 -0.8587
0.1154 1.59 400 0.1202 0.4013 -1.7865 1.0 2.1877 -202.6743 -156.6213 -1.0514 -0.8580
0.1141 1.69 425 0.1200 0.3990 -1.7907 1.0 2.1897 -202.7168 -156.6437 -1.0518 -0.8578
0.1284 1.79 450 0.1196 0.4012 -1.7899 1.0 2.1910 -202.7081 -156.6221 -1.0518 -0.8582
0.1225 1.89 475 0.1205 0.3984 -1.7858 1.0 2.1842 -202.6674 -156.6495 -1.0517 -0.8592
0.1224 1.99 500 0.1205 0.4005 -1.7841 1.0 2.1847 -202.6509 -156.6288 -1.0515 -0.8581

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2