llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
5789437 verified
|
raw
history blame
No virus
5.69 kB
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0572
  • Rewards/chosen: 0.4916
  • Rewards/rejected: -2.5677
  • Rewards/accuracies: 1.0
  • Rewards/margins: 3.0592
  • Logps/rejected: -210.4865
  • Logps/chosen: -155.7183
  • Logits/rejected: -1.0527
  • Logits/chosen: -0.8611

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6588 0.1 25 0.6197 0.0430 -0.1117 0.9633 0.1547 -185.9265 -160.2034 -1.0522 -0.8546
0.5198 0.2 50 0.4923 0.1198 -0.3424 0.9933 0.4622 -188.2335 -159.4357 -1.0525 -0.8554
0.422 0.3 75 0.3707 0.2016 -0.6277 1.0 0.8293 -191.0862 -158.6175 -1.0532 -0.8571
0.3133 0.4 100 0.2775 0.2622 -0.9287 1.0 1.1908 -194.0961 -158.0122 -1.0529 -0.8575
0.2536 0.5 125 0.2077 0.3244 -1.2160 1.0 1.5403 -196.9694 -157.3904 -1.0527 -0.8608
0.181 0.6 150 0.1559 0.3746 -1.5115 1.0 1.8860 -199.9242 -156.8883 -1.0534 -0.8595
0.1457 0.7 175 0.1203 0.4136 -1.7795 1.0 2.1931 -202.6049 -156.4983 -1.0534 -0.8620
0.1072 0.79 200 0.0950 0.4439 -2.0245 1.0 2.4684 -205.0550 -156.1949 -1.0532 -0.8613
0.0921 0.89 225 0.0792 0.4625 -2.2196 1.0 2.6821 -207.0056 -156.0085 -1.0535 -0.8604
0.0732 0.99 250 0.0694 0.4721 -2.3665 1.0 2.8387 -208.4748 -155.9124 -1.0530 -0.8609
0.0703 1.09 275 0.0636 0.4762 -2.4589 1.0 2.9351 -209.3987 -155.8720 -1.0527 -0.8600
0.0554 1.19 300 0.0606 0.4841 -2.5053 1.0 2.9894 -209.8628 -155.7928 -1.0528 -0.8614
0.0532 1.29 325 0.0592 0.4869 -2.5331 1.0 3.0200 -210.1407 -155.7649 -1.0527 -0.8606
0.061 1.39 350 0.0580 0.4912 -2.5550 1.0 3.0462 -210.3595 -155.7218 -1.0525 -0.8611
0.0612 1.49 375 0.0573 0.4930 -2.5633 1.0 3.0563 -210.4424 -155.7034 -1.0527 -0.8613
0.0539 1.59 400 0.0576 0.4921 -2.5602 1.0 3.0523 -210.4118 -155.7133 -1.0529 -0.8596
0.0517 1.69 425 0.0570 0.4917 -2.5691 1.0 3.0608 -210.5005 -155.7172 -1.0529 -0.8602
0.0627 1.79 450 0.0570 0.4938 -2.5669 1.0 3.0607 -210.4783 -155.6961 -1.0532 -0.8608
0.0575 1.89 475 0.0574 0.4911 -2.5664 1.0 3.0574 -210.4731 -155.7233 -1.0528 -0.8612
0.0578 1.99 500 0.0572 0.4916 -2.5677 1.0 3.0592 -210.4865 -155.7183 -1.0527 -0.8611

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2