zephyr-7b-dpo-full / README.md
wzhouad's picture
Model save
89b268e verified
|
raw
history blame
No virus
5.36 kB
metadata
license: mit
base_model: HuggingFaceH4/mistral-7b-sft-beta
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

Visualize in Weights & Biases

zephyr-7b-dpo-full

This model is a fine-tuned version of HuggingFaceH4/mistral-7b-sft-beta on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0314
  • Rewards/chosen: -1.8016
  • Rewards/rejected: -2.3386
  • Rewards/accuracies: 0.6996
  • Rewards/margins: 0.5369
  • Logps/rejected: -384.5558
  • Logps/chosen: -324.4056
  • Logits/rejected: -1.9462
  • Logits/chosen: -1.9728
  • Debug/policy Weights: 0.0527
  • Debug/losses: 0.0295
  • Debug/raw Losses: 0.5653

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Debug/policy Weights Debug/losses Debug/raw Losses
0.1733 0.0796 100 0.1633 -0.1419 -0.1761 0.5979 0.0341 -168.3045 -158.4381 -2.7036 -2.7117 0.2384 0.1618 0.6787
0.0826 0.1592 200 0.0750 -0.7884 -1.0004 0.6362 0.2120 -250.7415 -223.0814 -2.5404 -2.5508 0.1172 0.0736 0.6317
0.0515 0.2388 300 0.0573 -1.2448 -1.6060 0.6567 0.3612 -311.3039 -268.7243 -2.3397 -2.3553 0.0902 0.0558 0.6134
0.0343 0.3183 400 0.0302 -1.7725 -2.1338 0.6623 0.3614 -364.0837 -321.4913 -2.2855 -2.3007 0.0482 0.0284 0.5994
0.0432 0.3979 500 0.0432 -1.5065 -1.9835 0.6800 0.4770 -349.0468 -294.8951 -2.2406 -2.2643 0.0702 0.0407 0.5892
0.0342 0.4775 600 0.0321 -1.8281 -2.3049 0.6875 0.4769 -381.1920 -327.0503 -2.1134 -2.1351 0.0527 0.0302 0.5812
0.0283 0.5571 700 0.0283 -1.8441 -2.2808 0.6940 0.4366 -378.7769 -328.6566 -1.9677 -1.9900 0.0467 0.0268 0.5766
0.023 0.6367 800 0.0244 -2.0670 -2.5677 0.6884 0.5008 -407.4723 -350.9413 -1.9268 -1.9515 0.0400 0.0228 0.5787
0.032 0.7163 900 0.0335 -1.7467 -2.2731 0.6847 0.5264 -378.0125 -318.9173 -1.9262 -1.9521 0.0559 0.0316 0.5720
0.0294 0.7959 1000 0.0289 -1.9406 -2.4746 0.6866 0.5340 -398.1603 -338.3062 -1.9318 -1.9580 0.0484 0.0271 0.5695
0.0308 0.8754 1100 0.0311 -1.8111 -2.3364 0.7006 0.5253 -384.3376 -325.3560 -1.9554 -1.9814 0.0520 0.0291 0.5657
0.0303 0.9550 1200 0.0314 -1.8016 -2.3386 0.6996 0.5369 -384.5558 -324.4056 -1.9462 -1.9728 0.0527 0.0295 0.5653

Framework versions

  • Transformers 4.41.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.19.1