Edit model card

llama_DPO_model_e3

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0722
  • Rewards/chosen: 0.4618
  • Rewards/rejected: -2.3246
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.7864
  • Logps/rejected: -208.0558
  • Logps/chosen: -156.0157
  • Logits/rejected: -1.0512
  • Logits/chosen: -0.8590

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 7e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.675 0.1 25 0.6531 0.0248 -0.0584 0.8667 0.0832 -185.3936 -160.3859 -1.0523 -0.8549
0.5865 0.2 50 0.5720 0.0730 -0.1895 0.9933 0.2625 -186.7048 -159.9039 -1.0525 -0.8552
0.5203 0.3 75 0.4808 0.1258 -0.3673 1.0 0.4931 -188.4825 -159.3763 -1.0520 -0.8543
0.4291 0.4 100 0.3986 0.1804 -0.5547 1.0 0.7352 -190.3568 -158.8295 -1.0527 -0.8559
0.3712 0.5 125 0.3264 0.2303 -0.7594 1.0 0.9897 -192.4033 -158.3308 -1.0528 -0.8572
0.2856 0.6 150 0.2612 0.2765 -0.9893 1.0 1.2658 -194.7025 -157.8685 -1.0531 -0.8592
0.2433 0.7 175 0.2086 0.3223 -1.2201 1.0 1.5424 -197.0102 -157.4110 -1.0526 -0.8573
0.1822 0.79 200 0.1673 0.3627 -1.4385 1.0 1.8012 -199.1950 -157.0071 -1.0529 -0.8606
0.1511 0.89 225 0.1354 0.3921 -1.6585 1.0 2.0506 -201.3948 -156.7133 -1.0522 -0.8601
0.1211 0.99 250 0.1134 0.4119 -1.8492 1.0 2.2612 -203.3017 -156.5144 -1.0526 -0.8591
0.113 1.09 275 0.0999 0.4261 -1.9792 1.0 2.4054 -204.6017 -156.3724 -1.0511 -0.8578
0.087 1.19 300 0.0912 0.4374 -2.0704 1.0 2.5078 -205.5134 -156.2602 -1.0521 -0.8612
0.0808 1.29 325 0.0846 0.4439 -2.1510 1.0 2.5949 -206.3199 -156.1949 -1.0515 -0.8600
0.0875 1.39 350 0.0814 0.4537 -2.1942 1.0 2.6479 -206.7517 -156.0968 -1.0520 -0.8589
0.0826 1.49 375 0.0785 0.4559 -2.2325 1.0 2.6884 -207.1346 -156.0752 -1.0516 -0.8585
0.0717 1.59 400 0.0768 0.4564 -2.2611 1.0 2.7175 -207.4205 -156.0697 -1.0517 -0.8595
0.0694 1.69 425 0.0750 0.4602 -2.2778 1.0 2.7380 -207.5878 -156.0322 -1.0516 -0.8590
0.0809 1.79 450 0.0739 0.4647 -2.2925 1.0 2.7572 -207.7341 -155.9865 -1.0514 -0.8586
0.0747 1.89 475 0.0736 0.4595 -2.3075 1.0 2.7670 -207.8848 -156.0394 -1.0515 -0.8584
0.0751 1.99 500 0.0726 0.4643 -2.3130 1.0 2.7773 -207.9396 -155.9911 -1.0516 -0.8589
0.069 2.09 525 0.0725 0.4608 -2.3223 1.0 2.7831 -208.0324 -156.0257 -1.0512 -0.8598
0.0658 2.19 550 0.0724 0.4670 -2.3178 1.0 2.7847 -207.9872 -155.9642 -1.0514 -0.8580
0.0659 2.29 575 0.0720 0.4650 -2.3217 1.0 2.7867 -208.0269 -155.9841 -1.0516 -0.8592
0.0732 2.38 600 0.0725 0.4585 -2.3236 1.0 2.7821 -208.0455 -156.0485 -1.0511 -0.8591
0.0802 2.48 625 0.0723 0.4611 -2.3249 1.0 2.7859 -208.0582 -156.0233 -1.0511 -0.8582
0.0734 2.58 650 0.0723 0.4646 -2.3213 1.0 2.7859 -208.0227 -155.9879 -1.0510 -0.8591
0.068 2.68 675 0.0723 0.4627 -2.3230 1.0 2.7857 -208.0397 -156.0069 -1.0512 -0.8585
0.0708 2.78 700 0.0720 0.4617 -2.3278 1.0 2.7895 -208.0874 -156.0165 -1.0508 -0.8592
0.0621 2.88 725 0.0719 0.4613 -2.3296 1.0 2.7909 -208.1059 -156.0208 -1.0511 -0.8585
0.0708 2.98 750 0.0722 0.4618 -2.3246 1.0 2.7864 -208.0558 -156.0157 -1.0512 -0.8590

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
2
Unable to determine this model’s pipeline type. Check the docs .

Adapter for