llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
ca51f40 verified
|
raw
history blame
No virus
7.61 kB
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1001
  • Rewards/chosen: 0.4226
  • Rewards/rejected: -1.9804
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.4030
  • Logps/rejected: -204.6132
  • Logps/chosen: -156.4080
  • Logits/rejected: -1.0519
  • Logits/chosen: -0.8585

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 6e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6757 0.1 25 0.6650 0.0149 -0.0435 0.7767 0.0584 -185.2444 -160.4850 -1.0519 -0.8543
0.6136 0.2 50 0.5989 0.0552 -0.1462 0.9567 0.2014 -186.2718 -160.0822 -1.0523 -0.8553
0.5526 0.3 75 0.5225 0.1032 -0.2804 1.0 0.3837 -187.6138 -159.6014 -1.0520 -0.8542
0.4819 0.4 100 0.4502 0.1474 -0.4325 0.9967 0.5798 -189.1341 -159.1602 -1.0518 -0.8548
0.4253 0.5 125 0.3835 0.1905 -0.5943 1.0 0.7848 -190.7523 -158.7284 -1.0527 -0.8564
0.3448 0.6 150 0.3197 0.2328 -0.7813 1.0 1.0141 -192.6229 -158.3063 -1.0526 -0.8559
0.3007 0.7 175 0.2637 0.2788 -0.9753 1.0 1.2542 -194.5630 -157.8456 -1.0525 -0.8586
0.2369 0.79 200 0.2192 0.3135 -1.1671 1.0 1.4807 -196.4808 -157.4985 -1.0519 -0.8604
0.1987 0.89 225 0.1825 0.3436 -1.3550 1.0 1.6986 -198.3592 -157.1976 -1.0520 -0.8594
0.1616 0.99 250 0.1532 0.3687 -1.5379 1.0 1.9066 -200.1886 -156.9470 -1.0519 -0.8604
0.1525 1.09 275 0.1346 0.3861 -1.6703 1.0 2.0564 -201.5127 -156.7730 -1.0511 -0.8582
0.1194 1.19 300 0.1246 0.3970 -1.7483 1.0 2.1453 -202.2923 -156.6637 -1.0509 -0.8584
0.1128 1.29 325 0.1161 0.4062 -1.8227 1.0 2.2289 -203.0370 -156.5718 -1.0511 -0.8577
0.1194 1.39 350 0.1108 0.4127 -1.8680 1.0 2.2807 -203.4899 -156.5069 -1.0514 -0.8602
0.1123 1.49 375 0.1070 0.4151 -1.9092 1.0 2.3243 -203.9014 -156.4828 -1.0515 -0.8584
0.1008 1.59 400 0.1046 0.4209 -1.9290 1.0 2.3499 -204.0999 -156.4248 -1.0516 -0.8618
0.0971 1.69 425 0.1033 0.4208 -1.9461 1.0 2.3669 -204.2709 -156.4260 -1.0510 -0.8586
0.109 1.79 450 0.1019 0.4235 -1.9597 1.0 2.3832 -204.4061 -156.3985 -1.0510 -0.8587
0.1035 1.89 475 0.1009 0.4234 -1.9700 1.0 2.3934 -204.5094 -156.4001 -1.0517 -0.8580
0.1046 1.99 500 0.1004 0.4210 -1.9772 1.0 2.3983 -204.5820 -156.4234 -1.0511 -0.8603
0.0961 2.09 525 0.1002 0.4227 -1.9798 1.0 2.4025 -204.6080 -156.4070 -1.0518 -0.8587
0.0932 2.19 550 0.1000 0.4237 -1.9796 1.0 2.4033 -204.6052 -156.3964 -1.0518 -0.8597
0.0901 2.29 575 0.1002 0.4231 -1.9785 1.0 2.4015 -204.5942 -156.4030 -1.0514 -0.8594
0.1033 2.38 600 0.1003 0.4248 -1.9780 1.0 2.4028 -204.5901 -156.3859 -1.0517 -0.8616
0.1108 2.48 625 0.0999 0.4262 -1.9796 1.0 2.4057 -204.6053 -156.3723 -1.0517 -0.8583
0.1026 2.58 650 0.0998 0.4208 -1.9879 1.0 2.4088 -204.6889 -156.4255 -1.0522 -0.8594
0.0956 2.68 675 0.1001 0.4227 -1.9818 1.0 2.4045 -204.6279 -156.4070 -1.0517 -0.8588
0.1003 2.78 700 0.0996 0.4241 -1.9817 1.0 2.4058 -204.6262 -156.3926 -1.0516 -0.8584
0.0874 2.88 725 0.0997 0.4228 -1.9835 1.0 2.4064 -204.6450 -156.4057 -1.0519 -0.8609
0.1001 2.98 750 0.1001 0.4226 -1.9804 1.0 2.4030 -204.6132 -156.4080 -1.0519 -0.8585

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2