Edit model card

llama_DPO_model_e1

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1779
  • Rewards/chosen: 0.3527
  • Rewards/rejected: -1.3764
  • Rewards/accuracies: 1.0
  • Rewards/margins: 1.7292
  • Logps/rejected: -198.5740
  • Logps/chosen: -157.1067
  • Logits/rejected: -1.0528
  • Logits/chosen: -0.8587

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6603 0.1 25 0.6253 0.0416 -0.1007 0.9633 0.1423 -185.8169 -160.2181 -1.0525 -0.8550
0.5342 0.2 50 0.5074 0.1130 -0.3090 1.0 0.4220 -187.8993 -159.5039 -1.0525 -0.8569
0.4382 0.3 75 0.4022 0.1798 -0.5442 1.0 0.7241 -190.2517 -158.8354 -1.0530 -0.8563
0.3592 0.4 100 0.3212 0.2338 -0.7752 1.0 1.0090 -192.5613 -158.2961 -1.0531 -0.8579
0.3035 0.5 125 0.2590 0.2824 -0.9912 1.0 1.2736 -194.7217 -157.8096 -1.0528 -0.8583
0.2374 0.6 150 0.2125 0.3190 -1.1966 1.0 1.5157 -196.7760 -157.4438 -1.0528 -0.8575
0.2094 0.7 175 0.1868 0.3455 -1.3260 1.0 1.6714 -198.0693 -157.1793 -1.0528 -0.8598
0.1886 0.79 200 0.1796 0.3491 -1.3639 1.0 1.7130 -198.4486 -157.1428 -1.0532 -0.8617
0.1805 0.89 225 0.1785 0.3523 -1.3731 1.0 1.7254 -198.5406 -157.1107 -1.0530 -0.8593
0.1821 0.99 250 0.1779 0.3527 -1.3764 1.0 1.7292 -198.5740 -157.1067 -1.0528 -0.8587

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for