Edit model card

phi2-lora-quantized-distilabel-intel-orca-dpo-pairs

This model is a fine-tuned version of microsoft/phi-2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5173
  • Rewards/chosen: -0.0019
  • Rewards/rejected: -0.7725
  • Rewards/accuracies: 0.7816
  • Rewards/margins: 0.7706
  • Logps/rejected: -233.5226
  • Logps/chosen: -214.1249
  • Logits/rejected: 0.3181
  • Logits/chosen: 0.2015

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 20
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6887 0.14 20 0.6767 0.0030 -0.0331 0.6341 0.0361 -226.1282 -214.0752 0.2238 0.1343
0.6472 0.27 40 0.6171 0.0141 -0.1710 0.7639 0.1852 -227.5079 -213.9642 0.2464 0.1508
0.5759 0.41 60 0.5584 0.0123 -0.4023 0.7808 0.4146 -229.8206 -213.9829 0.2774 0.1736
0.526 0.54 80 0.5326 0.0036 -0.5790 0.7816 0.5826 -231.5877 -214.0700 0.2983 0.1884
0.4963 0.68 100 0.5225 0.0020 -0.6964 0.7825 0.6984 -232.7611 -214.0853 0.3131 0.1986
0.4977 0.81 120 0.5188 -0.0025 -0.7533 0.7816 0.7508 -233.3300 -214.1302 0.3162 0.2002
0.4818 0.95 140 0.5173 -0.0019 -0.7725 0.7816 0.7706 -233.5226 -214.1249 0.3181 0.2015

Framework versions

  • PEFT 0.7.1
  • Transformers 4.37.1
  • Pytorch 2.1.0+cu118
  • Datasets 2.16.1
  • Tokenizers 0.15.1
Downloads last month
2
Unable to determine this model’s pipeline type. Check the docs .

Adapter for