llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
cf78388 verified
|
raw
history blame
No virus
5.69 kB
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0739
  • Rewards/chosen: 0.4632
  • Rewards/rejected: -2.2899
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.7530
  • Logps/rejected: -207.7081
  • Logps/chosen: -156.0022
  • Logits/rejected: -1.0521
  • Logits/chosen: -0.8598

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 9e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6694 0.1 25 0.6365 0.0370 -0.0813 0.9433 0.1183 -185.6225 -160.2637 -1.0521 -0.8545
0.5526 0.2 50 0.5246 0.1015 -0.2765 0.9967 0.3780 -187.5744 -159.6185 -1.0524 -0.8560
0.4607 0.3 75 0.4173 0.1669 -0.5106 1.0 0.6775 -189.9152 -158.9647 -1.0530 -0.8562
0.3595 0.4 100 0.3251 0.2304 -0.7635 1.0 0.9940 -192.4449 -158.3297 -1.0530 -0.8567
0.297 0.5 125 0.2521 0.2883 -1.0189 1.0 1.3072 -194.9990 -157.7509 -1.0526 -0.8573
0.2217 0.6 150 0.1968 0.3313 -1.2778 1.0 1.6090 -197.5871 -157.3212 -1.0525 -0.8576
0.1832 0.7 175 0.1539 0.3750 -1.5241 1.0 1.8991 -200.0504 -156.8834 -1.0531 -0.8606
0.1374 0.79 200 0.1238 0.4055 -1.7491 1.0 2.1546 -202.3004 -156.5787 -1.0525 -0.8614
0.116 0.89 225 0.1027 0.4306 -1.9426 1.0 2.3732 -204.2353 -156.3275 -1.0526 -0.8606
0.095 0.99 250 0.0898 0.4405 -2.0888 1.0 2.5293 -205.6978 -156.2289 -1.0523 -0.8603
0.0921 1.09 275 0.0831 0.4465 -2.1733 1.0 2.6198 -206.5422 -156.1685 -1.0524 -0.8593
0.0734 1.19 300 0.0793 0.4520 -2.2224 1.0 2.6744 -207.0332 -156.1135 -1.0519 -0.8627
0.0711 1.29 325 0.0766 0.4558 -2.2584 1.0 2.7142 -207.3936 -156.0763 -1.0520 -0.8592
0.0806 1.39 350 0.0754 0.4630 -2.2725 1.0 2.7355 -207.5350 -156.0041 -1.0520 -0.8599
0.079 1.49 375 0.0748 0.4622 -2.2779 1.0 2.7401 -207.5887 -156.0115 -1.0522 -0.8602
0.0711 1.59 400 0.0746 0.4615 -2.2817 1.0 2.7432 -207.6269 -156.0192 -1.0519 -0.8603
0.0689 1.69 425 0.0744 0.4624 -2.2862 1.0 2.7486 -207.6718 -156.0103 -1.0522 -0.8594
0.0809 1.79 450 0.0742 0.4631 -2.2887 1.0 2.7518 -207.6965 -156.0032 -1.0517 -0.8610
0.0759 1.89 475 0.0740 0.4629 -2.2902 1.0 2.7531 -207.7117 -156.0047 -1.0517 -0.8594
0.0758 1.99 500 0.0739 0.4632 -2.2899 1.0 2.7530 -207.7081 -156.0022 -1.0521 -0.8598

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2