Edit model card

Visualize in Weights & Biases

zephyr-7b-dpo-qlora

This model is a fine-tuned version of data/zephyr-7b-sft-qlora-merged on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4935
  • Rewards/chosen: -2.1676
  • Rewards/rejected: -3.1735
  • Rewards/accuracies: 0.7698
  • Rewards/margins: 1.0059
  • Logps/rejected: -564.1859
  • Logps/chosen: -483.0326
  • Logits/rejected: -1.4139
  • Logits/chosen: -1.4811

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6185 0.1047 100 0.6240 -0.3010 -0.5396 0.6964 0.2387 -300.7997 -296.3736 -2.2954 -2.3537
0.5724 0.2094 200 0.5692 -0.8434 -1.3284 0.7302 0.4850 -379.6750 -350.6113 -2.2448 -2.2930
0.5366 0.3141 300 0.5249 -1.6887 -2.4863 0.7639 0.7976 -495.4648 -435.1429 -1.6220 -1.6850
0.5397 0.4187 400 0.5253 -1.2998 -1.9923 0.7698 0.6925 -446.0619 -396.2537 -1.7586 -1.8144
0.5003 0.5234 500 0.5013 -1.9982 -2.9207 0.7659 0.9226 -538.9065 -466.0909 -1.6049 -1.6682
0.4835 0.6281 600 0.5027 -2.5699 -3.5168 0.7560 0.9470 -598.5182 -523.2593 -1.3417 -1.4125
0.4715 0.7328 700 0.4956 -2.1902 -3.1936 0.7679 1.0035 -566.1955 -485.2894 -1.3782 -1.4480
0.4898 0.8375 800 0.4948 -2.0401 -3.0116 0.7698 0.9715 -547.9974 -470.2821 -1.4275 -1.4946
0.4785 0.9422 900 0.4933 -2.1713 -3.1801 0.7738 1.0088 -564.8470 -483.4024 -1.4105 -1.4778

Framework versions

  • PEFT 0.10.0
  • Transformers 4.41.0.dev0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
2
Unable to determine this model’s pipeline type. Check the docs .
Invalid base_model specified in model card metadata. Needs to be a model id from hf.co/models.

Dataset used to train statking/zephyr-7b-dpo-qlora