Edit model card

phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1-lora

This model is a fine-tuned version of Yhyu13/phi-2-sft-alpaca_gpt4_en-ep1 on the comparison_gpt4_en dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0168
  • Rewards/chosen: -1.5750
  • Rewards/rejected: -11.4002
  • Rewards/accuracies: 0.9956
  • Rewards/margins: 9.8253
  • Logps/rejected: -142.2352
  • Logps/chosen: -139.5300
  • Logits/rejected: 0.6066
  • Logits/chosen: 0.9744

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • total_eval_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0534 0.24 1000 0.0217 -1.6714 -10.2359 0.9945 8.5645 -130.5921 -140.4941 0.3064 0.5735
0.0182 0.49 2000 0.0175 -1.5469 -10.9602 0.9951 9.4133 -137.8349 -139.2487 0.6230 1.0709
0.0162 0.73 3000 0.0171 -1.5517 -11.4444 0.9962 9.8927 -142.6772 -139.2976 0.6325 1.0048
0.0154 0.98 4000 0.0168 -1.5741 -11.4004 0.9956 9.8262 -142.2364 -139.5214 0.6051 0.9729

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.5
  • Tokenizers 0.15.0
Downloads last month
7
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Yhyu13/phi-2-sft-dpo-gpt4_en-ep1-lora

Adapter
(1)
this model