Edit model card

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1045
  • Rewards/chosen: 0.4197
  • Rewards/rejected: -1.9316
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.3513
  • Logps/rejected: -204.1257
  • Logps/chosen: -156.4368
  • Logits/rejected: -1.0515
  • Logits/chosen: -0.8584

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 7.5e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6732 0.1 25 0.6518 0.0274 -0.0584 0.8867 0.0858 -185.3935 -160.3602 -1.0521 -0.8541
0.588 0.2 50 0.5616 0.0780 -0.2093 0.9933 0.2873 -186.9026 -159.8541 -1.0523 -0.8550
0.5077 0.3 75 0.4690 0.1360 -0.3896 1.0 0.5256 -188.7056 -159.2737 -1.0525 -0.8564
0.4179 0.4 100 0.3872 0.1873 -0.5861 1.0 0.7734 -190.6710 -158.7608 -1.0532 -0.8563
0.3614 0.5 125 0.3170 0.2381 -0.7895 1.0 1.0276 -192.7043 -158.2528 -1.0533 -0.8568
0.2812 0.6 150 0.2544 0.2856 -1.0121 1.0 1.2977 -194.9309 -157.7783 -1.0527 -0.8569
0.2378 0.7 175 0.2066 0.3262 -1.2240 1.0 1.5502 -197.0494 -157.3717 -1.0520 -0.8573
0.1866 0.79 200 0.1704 0.3591 -1.4222 1.0 1.7812 -199.0312 -157.0431 -1.0526 -0.8577
0.1555 0.89 225 0.1429 0.3829 -1.6050 1.0 1.9879 -200.8594 -156.8051 -1.0523 -0.8580
0.1312 0.99 250 0.1239 0.4002 -1.7534 1.0 2.1536 -202.3439 -156.6322 -1.0515 -0.8572
0.1276 1.09 275 0.1147 0.4086 -1.8325 1.0 2.2410 -203.1341 -156.5480 -1.0518 -0.8578
0.1038 1.19 300 0.1094 0.4144 -1.8779 1.0 2.2923 -203.5883 -156.4901 -1.0511 -0.8574
0.101 1.29 325 0.1072 0.4191 -1.9023 1.0 2.3214 -203.8326 -156.4429 -1.0512 -0.8569
0.1128 1.39 350 0.1056 0.4189 -1.9206 1.0 2.3394 -204.0154 -156.4454 -1.0511 -0.8576
0.11 1.49 375 0.1047 0.4220 -1.9262 1.0 2.3482 -204.0712 -156.4135 -1.0509 -0.8570
0.1001 1.59 400 0.1048 0.4224 -1.9281 1.0 2.3505 -204.0909 -156.4098 -1.0514 -0.8574
0.0978 1.69 425 0.1042 0.4246 -1.9292 1.0 2.3538 -204.1014 -156.3875 -1.0512 -0.8573
0.1111 1.79 450 0.1041 0.4244 -1.9292 1.0 2.3536 -204.1017 -156.3903 -1.0514 -0.8587
0.1064 1.89 475 0.1044 0.4199 -1.9317 1.0 2.3516 -204.1266 -156.4352 -1.0514 -0.8577
0.107 1.99 500 0.1045 0.4197 -1.9316 1.0 2.3513 -204.1257 -156.4368 -1.0515 -0.8584

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
1
Unable to determine this model’s pipeline type. Check the docs .

Adapter for