Edit model card

Llama-2-7b-hf-DPO-FullEval_LookAhead5_TTree1.2_TT0.7_TP0.7_TE0.1_Filtered0.1_V1.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9993
  • Rewards/chosen: -2.7308
  • Rewards/rejected: -3.0957
  • Rewards/accuracies: 0.6667
  • Rewards/margins: 0.3649
  • Logps/rejected: -93.3609
  • Logps/chosen: -104.7227
  • Logits/rejected: -1.6352
  • Logits/chosen: -1.6231

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.778 0.2994 78 0.6949 -0.3298 -0.3608 0.4167 0.0310 -66.0123 -80.7126 -0.5458 -0.5189
0.5895 0.5988 156 0.7221 -0.3528 -0.4053 0.5 0.0526 -66.4578 -80.9425 -0.5269 -0.4997
0.6423 0.8983 234 0.8029 -0.5435 -0.7369 0.5 0.1935 -69.7736 -82.8494 -0.5770 -0.5512
0.3985 1.1977 312 0.7639 -0.4640 -0.9647 0.5833 0.5008 -72.0517 -82.0546 -0.7305 -0.7083
0.4527 1.4971 390 0.8308 -0.3638 -0.5763 0.5 0.2125 -68.1671 -81.0529 -0.9151 -0.8936
0.3677 1.7965 468 0.7432 -0.8212 -1.2349 0.6667 0.4137 -74.7538 -85.6273 -1.0459 -1.0262
0.2591 2.0960 546 0.8634 -1.6345 -2.0043 0.5833 0.3699 -82.4478 -93.7598 -1.2888 -1.2721
0.1802 2.3954 624 1.1197 -3.1423 -3.3841 0.6667 0.2418 -96.2452 -108.8380 -1.6577 -1.6465
0.2054 2.6948 702 1.0008 -2.7513 -3.1213 0.6667 0.3700 -93.6174 -104.9277 -1.6348 -1.6227
0.2823 2.9942 780 0.9993 -2.7308 -3.0957 0.6667 0.3649 -93.3609 -104.7227 -1.6352 -1.6231

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
46
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-FullEval_LookAhead5_TTree1.2_TT0.7_TP0.7_TE0.1_Filtered0.1_V1.0

Adapter
this model