llama_DPO_model_e2 / README.md
thorirhrafn's picture
End of training
e4b8a67 verified
|
raw
history blame
No virus
5.69 kB
metadata
license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0587
  • Rewards/chosen: 0.4885
  • Rewards/rejected: -2.5446
  • Rewards/accuracies: 1.0
  • Rewards/margins: 3.0331
  • Logps/rejected: -210.2559
  • Logps/chosen: -155.7489
  • Logits/rejected: -1.0525
  • Logits/chosen: -0.8603

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6664 0.1 25 0.6240 0.0413 -0.1038 0.9633 0.1451 -185.8477 -160.2207 -1.0521 -0.8552
0.5275 0.2 50 0.4961 0.1194 -0.3323 1.0 0.4517 -188.1325 -159.4397 -1.0520 -0.8543
0.4242 0.3 75 0.3772 0.1960 -0.6107 1.0 0.8067 -190.9165 -158.6736 -1.0530 -0.8585
0.3194 0.4 100 0.2809 0.2609 -0.9146 1.0 1.1755 -193.9560 -158.0250 -1.0526 -0.8576
0.2569 0.5 125 0.2098 0.3243 -1.2033 1.0 1.5276 -196.8424 -157.3911 -1.0523 -0.8568
0.1815 0.6 150 0.1591 0.3689 -1.4935 1.0 1.8624 -199.7451 -156.9453 -1.0527 -0.8590
0.1488 0.7 175 0.1233 0.4109 -1.7538 1.0 2.1647 -202.3471 -156.5246 -1.0528 -0.8590
0.1097 0.79 200 0.0966 0.4448 -2.0010 1.0 2.4458 -204.8196 -156.1859 -1.0531 -0.8595
0.0925 0.89 225 0.0804 0.4615 -2.1974 1.0 2.6589 -206.7837 -156.0186 -1.0534 -0.8616
0.0748 0.99 250 0.0707 0.4708 -2.3440 1.0 2.8148 -208.2495 -155.9261 -1.0526 -0.8606
0.0717 1.09 275 0.0649 0.4788 -2.4354 1.0 2.9142 -209.1637 -155.8455 -1.0523 -0.8600
0.057 1.19 300 0.0616 0.4820 -2.4896 1.0 2.9716 -209.7052 -155.8138 -1.0532 -0.8609
0.0543 1.29 325 0.0598 0.4864 -2.5199 1.0 3.0064 -210.0089 -155.7695 -1.0522 -0.8598
0.0634 1.39 350 0.0591 0.4873 -2.5345 1.0 3.0218 -210.1548 -155.7612 -1.0529 -0.8603
0.0614 1.49 375 0.0584 0.4896 -2.5466 1.0 3.0362 -210.2760 -155.7379 -1.0528 -0.8597
0.0543 1.59 400 0.0580 0.4918 -2.5464 1.0 3.0382 -210.2738 -155.7159 -1.0528 -0.8597
0.0532 1.69 425 0.0579 0.4902 -2.5495 1.0 3.0397 -210.3050 -155.7321 -1.0520 -0.8605
0.0632 1.79 450 0.0577 0.4907 -2.5514 1.0 3.0422 -210.3238 -155.7266 -1.0522 -0.8601
0.0596 1.89 475 0.0579 0.4923 -2.5509 1.0 3.0432 -210.3188 -155.7112 -1.0527 -0.8614
0.0597 1.99 500 0.0587 0.4885 -2.5446 1.0 3.0331 -210.2559 -155.7489 -1.0525 -0.8603

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2