zephyr-7b-dpo-full / README.md
wzhouad's picture
Model save
ba49b4e verified
|
raw
history blame
No virus
3.52 kB
metadata
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5261
  • Rewards/chosen: -2.4591
  • Rewards/rejected: -3.9221
  • Rewards/accuracies: 0.7773
  • Rewards/margins: 1.4631
  • Logps/rejected: -703.8400
  • Logps/chosen: -549.4910
  • Logits/rejected: 0.0289
  • Logits/chosen: 0.0663

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 2
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6201 0.21 100 0.6253 -0.2753 -0.6662 0.7031 0.3909 -378.2405 -331.1124 0.4172 0.3706
0.5547 0.42 200 0.5549 -0.6988 -1.4726 0.7656 0.7738 -458.8863 -373.4661 0.4261 0.3909
0.5343 0.63 300 0.5316 -0.8044 -1.6474 0.7656 0.8430 -476.3628 -384.0199 0.2851 0.2449
0.5323 0.84 400 0.5211 -0.9068 -1.8283 0.7812 0.9216 -494.4600 -394.2621 0.2834 0.2514
0.352 1.05 500 0.5258 -1.9533 -3.4166 0.7969 1.4634 -653.2899 -498.9117 -0.0846 -0.0654
0.3342 1.26 600 0.5268 -2.3123 -3.7246 0.7930 1.4124 -684.0857 -534.8101 0.1128 0.1344
0.337 1.47 700 0.5290 -2.3753 -3.8837 0.7773 1.5084 -699.9910 -541.1116 0.0099 0.0414
0.3398 1.67 800 0.5297 -2.5097 -4.0133 0.7734 1.5036 -712.9506 -554.5546 0.0381 0.0750
0.307 1.88 900 0.5261 -2.4591 -3.9221 0.7773 1.4631 -703.8400 -549.4910 0.0289 0.0663

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1