Edit model card

zephyr-7b-dpo-selfgen

This model is a fine-tuned version of EllieS/zephyr-7b-sft-qlora on the EllieS/pubmedqa_dpo_selfgen_data dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Rewards/chosen: -6.6466
  • Rewards/rejected: -19.5106
  • Rewards/accuracies: 1.0
  • Rewards/margins: 12.8639
  • Logps/rejected: -1996.6047
  • Logps/chosen: -731.7379
  • Logits/rejected: -2.0588
  • Logits/chosen: -2.4883

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.0241 0.42 7000 -2.8328 -2.8312 -143.5124 -856.1008 0.0101 1.0 -0.7644 7.3411 -8.1055
0.0001 0.83 14000 -2.3450 -1.9435 -714.5292 -1741.5647 0.0002 1.0 -6.4745 10.4856 -16.9602
0.0003 1.25 21000 -2.4293 -2.0264 -695.5377 -1973.5151 0.0001 1.0 -6.2846 12.9950 -19.2797
0.0 1.67 28000 -2.5393 -2.1793 -619.2334 -1821.8682 0.0001 1.0 -5.5216 12.2416 -17.7632
0.0001 2.09 35000 -2.4633 -1.9800 -817.4478 -2071.8862 0.0000 1.0 -7.5037 12.7596 -20.2634
0.0 2.5 42000 -2.4883 -2.0593 -730.7642 -2000.8484 0.0000 1.0 -6.6369 12.9161 -19.5530
0.0001 2.92 49000 -2.4895 -2.0591 -732.9475 -1999.9326 0.0000 1.0 -6.6587 12.8851 -19.5438

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
1
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train EllieS/zephyr-7b-dpo-selfgen