llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
10094a9 verified
|
raw
history blame
No virus
5.69 kB
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0937
  • Rewards/chosen: 0.4389
  • Rewards/rejected: -2.0384
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.4774
  • Logps/rejected: -205.1940
  • Logps/chosen: -156.2447
  • Logits/rejected: -1.0509
  • Logits/chosen: -0.8587

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.673 0.1 25 0.6445 0.0273 -0.0740 0.9000 0.1013 -185.5491 -160.3607 -1.0521 -0.8545
0.5737 0.2 50 0.5485 0.0856 -0.2335 0.9933 0.3190 -187.1442 -159.7781 -1.0526 -0.8551
0.4843 0.3 75 0.4496 0.1470 -0.4343 1.0 0.5814 -189.1528 -159.1637 -1.0527 -0.8571
0.4006 0.4 100 0.3655 0.2043 -0.6419 1.0 0.8462 -191.2286 -158.5909 -1.0521 -0.8556
0.3417 0.5 125 0.2945 0.2551 -0.8630 1.0 1.1180 -193.4393 -158.0833 -1.0522 -0.8562
0.2601 0.6 150 0.2353 0.3032 -1.0903 1.0 1.3935 -195.7128 -157.6020 -1.0520 -0.8597
0.2197 0.7 175 0.1891 0.3442 -1.3124 1.0 1.6565 -197.9333 -157.1923 -1.0522 -0.8579
0.1675 0.79 200 0.1532 0.3815 -1.5253 1.0 1.9067 -200.0621 -156.8192 -1.0526 -0.8582
0.1417 0.89 225 0.1289 0.4011 -1.7082 1.0 2.1094 -201.8920 -156.6225 -1.0525 -0.8585
0.1203 0.99 250 0.1117 0.4214 -1.8534 1.0 2.2748 -203.3437 -156.4196 -1.0517 -0.8603
0.1156 1.09 275 0.1034 0.4296 -1.9336 1.0 2.3633 -204.1459 -156.3377 -1.0517 -0.8590
0.0942 1.19 300 0.0990 0.4310 -1.9823 1.0 2.4133 -204.6330 -156.3240 -1.0514 -0.8577
0.0903 1.29 325 0.0957 0.4380 -2.0137 1.0 2.4517 -204.9467 -156.2539 -1.0511 -0.8593
0.1023 1.39 350 0.0946 0.4384 -2.0296 1.0 2.4680 -205.1059 -156.2503 -1.0519 -0.8587
0.0984 1.49 375 0.0945 0.4352 -2.0350 1.0 2.4702 -205.1597 -156.2819 -1.0510 -0.8580
0.0899 1.59 400 0.0939 0.4360 -2.0393 1.0 2.4752 -205.2024 -156.2742 -1.0513 -0.8594
0.0883 1.69 425 0.0939 0.4374 -2.0378 1.0 2.4752 -205.1877 -156.2598 -1.0514 -0.8590
0.1011 1.79 450 0.0939 0.4368 -2.0412 1.0 2.4781 -205.2217 -156.2654 -1.0513 -0.8583
0.0962 1.89 475 0.0935 0.4403 -2.0395 1.0 2.4798 -205.2041 -156.2308 -1.0510 -0.8574
0.0971 1.99 500 0.0937 0.4389 -2.0384 1.0 2.4774 -205.1940 -156.2447 -1.0509 -0.8587

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2