imelnyk's picture
End of training
569e20f verified
metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-qlora
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: zephyr-7b-dpo-qlora-fsdp
    results: []

zephyr-7b-dpo-qlora-fsdp

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8742
  • Rewards/chosen: 0.0082
  • Rewards/rejected: 0.0003
  • Rewards/accuracies: 0.6726
  • Rewards/margins: 0.0079
  • Logps/rejected: -242.3632
  • Logps/chosen: -266.8597
  • Logits/rejected: -2.3743
  • Logits/chosen: -2.4108

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 10
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 240
  • total_eval_batch_size: 48
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.2536 0.39 100 0.2792 0.0024 0.0006 0.6042 0.0019 -242.3385 -267.4340 -2.3735 -2.4122
0.5352 0.79 200 0.5010 0.0011 -0.0019 0.5744 0.0030 -242.5832 -267.5640 -2.3629 -2.4014
0.3676 1.18 300 0.8293 0.0079 0.0027 0.5982 0.0052 -242.1211 -266.8856 -2.3788 -2.4168
0.366 1.57 400 0.8239 0.0065 0.0007 0.6399 0.0058 -242.3221 -267.0256 -2.3774 -2.4146
0.292 1.96 500 0.8146 0.0050 -0.0005 0.6399 0.0055 -242.4462 -267.1794 -2.3978 -2.4343
0.1355 2.36 600 0.9651 0.0047 -0.0013 0.6161 0.0060 -242.5212 -267.2061 -2.3796 -2.4178
0.1327 2.75 700 0.9985 0.0046 -0.0019 0.6339 0.0065 -242.5883 -267.2230 -2.3690 -2.4066
0.0389 3.14 800 0.8932 0.0080 0.0003 0.6518 0.0078 -242.3696 -266.8748 -2.3563 -2.3947
0.029 3.53 900 0.9392 0.0090 0.0008 0.6577 0.0082 -242.3114 -266.7798 -2.3752 -2.4118
0.0198 3.93 1000 0.8200 0.0087 0.0010 0.6577 0.0077 -242.2917 -266.8047 -2.3780 -2.4145
0.0059 4.32 1100 0.8904 0.0080 0.0002 0.6577 0.0078 -242.3739 -266.8760 -2.3744 -2.4108
0.0042 4.71 1200 0.8779 0.0080 0.0001 0.6518 0.0080 -242.3892 -266.8771 -2.3753 -2.4119

Framework versions

  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2