llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
e2ce8fb verified
|
raw
history blame
No virus
5.69 kB
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0896
  • Rewards/chosen: 0.4401
  • Rewards/rejected: -2.0930
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.5330
  • Logps/rejected: -205.7391
  • Logps/chosen: -156.2334
  • Logits/rejected: -1.0514
  • Logits/chosen: -0.8587

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6699 0.1 25 0.6428 0.0307 -0.0744 0.9033 0.1051 -185.5532 -160.3267 -1.0520 -0.8550
0.5702 0.2 50 0.5471 0.0866 -0.2359 0.9933 0.3225 -187.1690 -159.7680 -1.0514 -0.8544
0.488 0.3 75 0.4456 0.1502 -0.4424 1.0 0.5926 -189.2334 -159.1314 -1.0527 -0.8555
0.3957 0.4 100 0.3600 0.2054 -0.6615 1.0 0.8669 -191.4245 -158.5795 -1.0530 -0.8577
0.3338 0.5 125 0.2865 0.2569 -0.8933 1.0 1.1502 -193.7425 -158.0646 -1.0524 -0.8564
0.253 0.6 150 0.2257 0.3043 -1.1373 1.0 1.4416 -196.1830 -157.5914 -1.0523 -0.8570
0.2134 0.7 175 0.1819 0.3496 -1.3537 1.0 1.7033 -198.3466 -157.1379 -1.0530 -0.8584
0.1613 0.79 200 0.1473 0.3842 -1.5693 1.0 1.9535 -200.5027 -156.7917 -1.0525 -0.8591
0.1358 0.89 225 0.1231 0.4031 -1.7582 1.0 2.1614 -202.3919 -156.6024 -1.0523 -0.8593
0.115 0.99 250 0.1076 0.4205 -1.8980 1.0 2.3185 -203.7897 -156.4292 -1.0521 -0.8590
0.1111 1.09 275 0.0989 0.4291 -1.9856 1.0 2.4148 -204.6660 -156.3426 -1.0515 -0.8591
0.0902 1.19 300 0.0949 0.4280 -2.0337 1.0 2.4617 -205.1465 -156.3540 -1.0507 -0.8576
0.0867 1.29 325 0.0920 0.4325 -2.0705 1.0 2.5030 -205.5146 -156.3087 -1.0510 -0.8576
0.0973 1.39 350 0.0905 0.4357 -2.0839 1.0 2.5196 -205.6485 -156.2766 -1.0506 -0.8576
0.0942 1.49 375 0.0897 0.4422 -2.0838 1.0 2.5260 -205.6476 -156.2122 -1.0515 -0.8578
0.0858 1.59 400 0.0897 0.4392 -2.0903 1.0 2.5295 -205.7121 -156.2415 -1.0515 -0.8587
0.083 1.69 425 0.0893 0.4401 -2.0972 1.0 2.5373 -205.7811 -156.2327 -1.0511 -0.8584
0.0964 1.79 450 0.0897 0.4368 -2.0947 1.0 2.5315 -205.7564 -156.2662 -1.0511 -0.8577
0.0931 1.89 475 0.0890 0.4406 -2.0970 1.0 2.5376 -205.7794 -156.2282 -1.0512 -0.8585
0.0915 1.99 500 0.0896 0.4401 -2.0930 1.0 2.5330 -205.7391 -156.2334 -1.0514 -0.8587

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2