Edit model card

Visualize in Weights & Biases

Llama-2-7b-hf-DPO-PartialEval_ET0.1_MT1.2_1-5_V.1.0_Filtered0.1_V3.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9702
  • Rewards/chosen: -3.4296
  • Rewards/rejected: -3.9944
  • Rewards/accuracies: 0.6000
  • Rewards/margins: 0.5648
  • Logps/rejected: -105.9646
  • Logps/chosen: -110.1487
  • Logits/rejected: -1.2385
  • Logits/chosen: -1.2295

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7679 0.2993 60 0.6593 0.0108 -0.0521 0.6000 0.0629 -66.5412 -75.7445 -0.4729 -0.4564
0.5441 0.5985 120 0.6291 -0.5862 -0.7160 0.7000 0.1298 -73.1809 -81.7145 -0.4661 -0.4518
0.4072 0.8978 180 0.6867 -0.9478 -1.0597 0.6000 0.1119 -76.6181 -85.3308 -0.4052 -0.3928
0.2754 1.1970 240 0.6937 -1.6591 -1.7694 0.6000 0.1103 -83.7147 -92.4433 -0.6080 -0.5961
0.3506 1.4963 300 0.7085 -1.2433 -1.5083 0.5 0.2650 -81.1036 -88.2852 -0.7011 -0.6903
0.266 1.7955 360 0.8548 -1.8431 -2.1010 0.5 0.2579 -87.0309 -94.2836 -0.9269 -0.9164
0.5629 2.0948 420 0.7761 -2.2110 -2.6203 0.5 0.4093 -92.2235 -97.9622 -1.0014 -0.9908
0.0832 2.3940 480 1.0148 -3.6008 -4.0627 0.5 0.4618 -106.6473 -111.8609 -1.2254 -1.2163
0.1597 2.6933 540 0.9907 -3.5220 -4.0847 0.5 0.5627 -106.8678 -111.0727 -1.2661 -1.2566
0.1285 2.9925 600 0.9702 -3.4296 -3.9944 0.6000 0.5648 -105.9646 -110.1487 -1.2385 -1.2295

Framework versions

  • PEFT 0.11.1
  • Transformers 4.42.4
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-PartialEval_ET0.1_MT1.2_1-5_V.1.0_Filtered0.1_V3.0

Adapter
(1057)
this model