Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V1

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9544
  • Rewards/chosen: -4.7879
  • Rewards/rejected: -3.3303
  • Rewards/accuracies: 0.4167
  • Rewards/margins: -1.4576
  • Logps/rejected: -139.8422
  • Logps/chosen: -221.9621
  • Logits/rejected: -0.3063
  • Logits/chosen: -0.3612

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.666 0.3008 80 0.7114 -0.0994 -0.0692 0.5 -0.0302 -107.2313 -175.0767 0.3992 0.3563
0.7848 0.6015 160 0.7871 -0.5530 -0.4147 0.4167 -0.1383 -110.6864 -179.6128 0.4061 0.3637
0.7413 0.9023 240 0.8345 -0.6343 -0.4162 0.4167 -0.2181 -110.7009 -180.4258 0.3814 0.3393
0.5906 1.2030 320 1.0830 -1.4953 -0.8871 0.4167 -0.6082 -115.4103 -189.0355 0.2654 0.2219
0.3771 1.5038 400 1.1984 -2.0768 -1.2714 0.3333 -0.8053 -119.2534 -194.8505 0.1371 0.0921
0.2132 1.8045 480 1.2438 -2.4881 -1.6813 0.3333 -0.8068 -123.3516 -198.9633 0.0444 -0.0027
0.0544 2.1053 560 1.6818 -3.7464 -2.5485 0.1667 -1.1979 -132.0241 -211.5465 -0.1111 -0.1621
0.0452 2.4060 640 1.8619 -4.4511 -3.1120 0.4167 -1.3391 -137.6592 -218.5939 -0.2407 -0.2942
0.023 2.7068 720 1.9544 -4.7879 -3.3303 0.4167 -1.4576 -139.8422 -221.9621 -0.3063 -0.3612

Framework versions

  • PEFT 0.13.2
  • Transformers 4.45.2
  • Pytorch 2.4.0+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
10
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V1

Adapter
(1767)
this model