Edit model card

Llama-2-7b-hf-DPO-Filtered-0.2-version-3

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8860
  • Rewards/chosen: -2.2683
  • Rewards/rejected: -2.6831
  • Rewards/accuracies: 0.5500
  • Rewards/margins: 0.4148
  • Logps/rejected: -83.1208
  • Logps/chosen: -77.0439
  • Logits/rejected: -1.3980
  • Logits/chosen: -1.4004

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6488 0.2994 72 0.6678 -0.0112 -0.0724 0.6000 0.0613 -57.0147 -54.4726 -0.5812 -0.5741
0.6 0.5988 144 0.6316 -0.4870 -0.6800 0.6000 0.1930 -63.0904 -59.2310 -0.6746 -0.6711
0.5876 0.8981 216 0.6931 -0.4396 -0.5539 0.5 0.1143 -61.8289 -58.7568 -0.5937 -0.5907
0.4949 1.1975 288 0.7890 -0.7079 -0.9614 0.6500 0.2535 -65.9037 -61.4400 -0.8747 -0.8740
0.565 1.4969 360 0.9088 -1.6793 -1.8869 0.5500 0.2077 -75.1596 -71.1538 -1.2245 -1.2255
0.283 1.7963 432 0.8288 -1.8095 -2.1999 0.6000 0.3905 -78.2897 -72.4555 -1.2749 -1.2766
0.1794 2.0956 504 0.8811 -1.8931 -2.2411 0.5500 0.3480 -78.7009 -73.2920 -1.3148 -1.3161
0.3907 2.3950 576 0.8772 -2.2014 -2.6107 0.5500 0.4093 -82.3973 -76.375 -1.4219 -1.4232
0.0225 2.6944 648 0.8843 -2.2655 -2.6784 0.5500 0.4129 -83.0741 -77.0161 -1.3961 -1.3981
0.2077 2.9938 720 0.8860 -2.2683 -2.6831 0.5500 0.4148 -83.1208 -77.0439 -1.3980 -1.4004

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
2
Unable to determine this model’s pipeline type. Check the docs .

Adapter for