llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
210bcff verified
|
raw
history blame
No virus
5.69 kB
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0037
  • Rewards/chosen: 0.5612
  • Rewards/rejected: -5.9460
  • Rewards/accuracies: 1.0
  • Rewards/margins: 6.5073
  • Logps/rejected: -244.2698
  • Logps/chosen: -155.0214
  • Logits/rejected: -1.0632
  • Logits/chosen: -0.8795

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.3693 0.1 25 0.1586 0.3906 -1.4782 1.0 1.8688 -199.5915 -156.7276 -1.0532 -0.8639
0.0442 0.2 50 0.0275 0.5577 -3.3969 1.0 3.9546 -218.7789 -155.0573 -1.0591 -0.8709
0.0153 0.3 75 0.0123 0.5805 -4.3685 1.0 4.9490 -228.4945 -154.8291 -1.0641 -0.8765
0.0098 0.4 100 0.0083 0.5880 -4.8560 1.0 5.4440 -233.3696 -154.7535 -1.0654 -0.8801
0.0072 0.5 125 0.0065 0.5779 -5.1733 1.0 5.7513 -236.5429 -154.8546 -1.0667 -0.8808
0.0056 0.6 150 0.0058 0.5669 -5.3483 1.0 5.9152 -238.2926 -154.9651 -1.0674 -0.8815
0.0059 0.7 175 0.0051 0.5733 -5.4970 1.0 6.0704 -239.7797 -154.9004 -1.0659 -0.8820
0.0065 0.79 200 0.0047 0.5713 -5.6304 1.0 6.2017 -241.1136 -154.9210 -1.0653 -0.8803
0.0044 0.89 225 0.0043 0.5689 -5.7514 1.0 6.3203 -242.3240 -154.9452 -1.0650 -0.8816
0.004 0.99 250 0.0041 0.5671 -5.8118 1.0 6.3790 -242.9280 -154.9625 -1.0644 -0.8796
0.0029 1.09 275 0.0040 0.5648 -5.8589 1.0 6.4237 -243.3990 -154.9863 -1.0633 -0.8800
0.0035 1.19 300 0.0038 0.5658 -5.8892 1.0 6.4549 -243.7013 -154.9761 -1.0630 -0.8785
0.0024 1.29 325 0.0039 0.5618 -5.9044 1.0 6.4662 -243.8535 -155.0163 -1.0628 -0.8787
0.0034 1.39 350 0.0038 0.5595 -5.9136 1.0 6.4731 -243.9456 -155.0389 -1.0632 -0.8788
0.0029 1.49 375 0.0038 0.5601 -5.9328 1.0 6.4929 -244.1375 -155.0332 -1.0634 -0.8792
0.003 1.59 400 0.0038 0.5605 -5.9352 1.0 6.4957 -244.1614 -155.0284 -1.0632 -0.8793
0.0021 1.69 425 0.0038 0.5593 -5.9410 1.0 6.5003 -244.2199 -155.0412 -1.0630 -0.8792
0.0036 1.79 450 0.0038 0.5605 -5.9408 1.0 6.5013 -244.2178 -155.0292 -1.0631 -0.8794
0.0031 1.89 475 0.0038 0.5567 -5.9439 1.0 6.5006 -244.2483 -155.0666 -1.0634 -0.8782
0.0032 1.99 500 0.0037 0.5612 -5.9460 1.0 6.5073 -244.2698 -155.0214 -1.0632 -0.8795

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2