Edit model card

Llama-2-7b-hf-DPO-PartialEval_LookAhead5_ET0.1_MT1.2_1-5_Filtered0.1_V2.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0109
  • Rewards/chosen: -2.4672
  • Rewards/rejected: -2.3527
  • Rewards/accuracies: 0.5
  • Rewards/margins: -0.1144
  • Logps/rejected: -101.7496
  • Logps/chosen: -111.6802
  • Logits/rejected: -1.3387
  • Logits/chosen: -1.3182

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6944 0.3026 77 0.7061 -0.0481 -0.0149 0.3333 -0.0332 -78.3709 -87.4894 -0.4703 -0.4454
0.7162 0.6051 154 0.7499 -0.0798 0.0370 0.25 -0.1168 -77.8523 -87.8063 -0.5650 -0.5389
0.7265 0.9077 231 0.7156 -0.1175 -0.0976 0.5833 -0.0199 -79.1982 -88.1833 -0.5730 -0.5467
0.6934 1.2102 308 0.8015 -0.6012 -0.5127 0.5 -0.0884 -83.3497 -93.0202 -0.7454 -0.7223
0.4346 1.5128 385 0.8278 -0.8704 -0.8319 0.5 -0.0385 -86.5415 -95.7126 -0.9246 -0.9032
0.6773 1.8153 462 0.7807 -0.8712 -0.8972 0.5 0.0260 -87.1945 -95.7207 -0.8954 -0.8734
0.3446 2.1179 539 0.8424 -1.5623 -1.5383 0.5833 -0.0240 -93.6050 -102.6317 -1.0923 -1.0707
0.1483 2.4204 616 0.9759 -2.2720 -2.1419 0.5833 -0.1301 -99.6412 -109.7289 -1.2875 -1.2666
0.213 2.7230 693 1.0109 -2.4672 -2.3527 0.5 -0.1144 -101.7496 -111.6802 -1.3387 -1.3182

Framework versions

  • PEFT 0.12.0
  • Transformers 4.43.3
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-PartialEval_LookAhead5_ET0.1_MT1.2_1-5_Filtered0.1_V2.0

Adapter
this model