llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
9bbfa57 verified
|
raw
history blame
No virus
7.61 kB
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1526
  • Rewards/chosen: 0.3611
  • Rewards/rejected: -1.5450
  • Rewards/accuracies: 1.0
  • Rewards/margins: 1.9061
  • Logps/rejected: -200.2592
  • Logps/chosen: -157.0226
  • Logits/rejected: -1.0513
  • Logits/chosen: -0.8571

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6819 0.1 25 0.6708 0.0151 -0.0312 0.7567 0.0463 -185.1220 -160.4831 -1.0517 -0.8540
0.6351 0.2 50 0.6228 0.0428 -0.1054 0.9600 0.1482 -185.8636 -160.2060 -1.0524 -0.8552
0.5874 0.3 75 0.5655 0.0762 -0.2019 0.9967 0.2781 -186.8286 -159.8719 -1.0525 -0.8548
0.5179 0.4 100 0.5030 0.1133 -0.3207 1.0 0.4340 -188.0166 -159.5010 -1.0521 -0.8545
0.479 0.5 125 0.4468 0.1501 -0.4388 1.0 0.5889 -189.1974 -159.1327 -1.0524 -0.8554
0.406 0.6 150 0.3904 0.1842 -0.5778 1.0 0.7620 -190.5874 -158.7915 -1.0525 -0.8576
0.3731 0.7 175 0.3377 0.2223 -0.7247 1.0 0.9470 -192.0564 -158.4104 -1.0521 -0.8559
0.3075 0.79 200 0.2918 0.2537 -0.8769 1.0 1.1305 -193.5782 -158.0974 -1.0525 -0.8583
0.2621 0.89 225 0.2517 0.2822 -1.0278 1.0 1.3100 -195.0876 -157.8119 -1.0525 -0.8573
0.2285 0.99 250 0.2180 0.3118 -1.1738 1.0 1.4855 -196.5471 -157.5160 -1.0517 -0.8568
0.2162 1.09 275 0.1948 0.3279 -1.2897 1.0 1.6176 -197.7066 -157.3551 -1.0513 -0.8567
0.1752 1.19 300 0.1810 0.3383 -1.3661 1.0 1.7044 -198.4706 -157.2514 -1.0511 -0.8576
0.1672 1.29 325 0.1714 0.3456 -1.4242 1.0 1.7698 -199.0516 -157.1775 -1.0509 -0.8568
0.1722 1.39 350 0.1646 0.3535 -1.4653 1.0 1.8187 -199.4624 -157.0993 -1.0510 -0.8568
0.1649 1.49 375 0.1596 0.3586 -1.4919 1.0 1.8505 -199.7286 -157.0477 -1.0512 -0.8569
0.1534 1.59 400 0.1580 0.3603 -1.5059 1.0 1.8663 -199.8687 -157.0304 -1.0507 -0.8571
0.1492 1.69 425 0.1561 0.3589 -1.5194 1.0 1.8783 -200.0034 -157.0448 -1.0514 -0.8578
0.1625 1.79 450 0.1564 0.3586 -1.5205 1.0 1.8791 -200.0150 -157.0482 -1.0509 -0.8570
0.1561 1.89 475 0.1535 0.3613 -1.5366 1.0 1.8979 -200.1756 -157.0212 -1.0510 -0.8576
0.1565 1.99 500 0.1529 0.3643 -1.5393 1.0 1.9036 -200.2028 -156.9913 -1.0513 -0.8567
0.1476 2.09 525 0.1530 0.3640 -1.5392 1.0 1.9032 -200.2021 -156.9944 -1.0511 -0.8569
0.1457 2.19 550 0.1530 0.3605 -1.5406 1.0 1.9011 -200.2155 -157.0287 -1.0507 -0.8577
0.1376 2.29 575 0.1529 0.3585 -1.5466 1.0 1.9051 -200.2757 -157.0492 -1.0508 -0.8579
0.1574 2.38 600 0.1527 0.3634 -1.5448 1.0 1.9082 -200.2574 -156.9998 -1.0508 -0.8566
0.1662 2.48 625 0.1518 0.3645 -1.5465 1.0 1.9109 -200.2742 -156.9890 -1.0509 -0.8572
0.1535 2.58 650 0.1523 0.3628 -1.5458 1.0 1.9086 -200.2675 -157.0059 -1.0510 -0.8571
0.1488 2.68 675 0.1518 0.3658 -1.5446 1.0 1.9104 -200.2561 -156.9763 -1.0510 -0.8572
0.1564 2.78 700 0.1526 0.3618 -1.5452 1.0 1.9071 -200.2618 -157.0154 -1.0512 -0.8568
0.1367 2.88 725 0.1526 0.3643 -1.5426 1.0 1.9069 -200.2352 -156.9905 -1.0513 -0.8570
0.1543 2.98 750 0.1526 0.3611 -1.5450 1.0 1.9061 -200.2592 -157.0226 -1.0513 -0.8571

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2