zephyr-7b-dpo-full / README.md
happii's picture
End of training
7ed7104 verified
|
raw
history blame
7.51 kB
metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9109
  • Rewards/chosen: -6.4067
  • Rewards/rejected: -10.6017
  • Rewards/accuracies: 0.7659
  • Rewards/margins: 4.1951
  • Logps/rejected: -366.2361
  • Logps/chosen: -346.0206
  • Logits/rejected: -1.3898
  • Logits/chosen: -1.6525

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6044 0.1047 100 0.6129 0.3596 0.0870 0.7302 0.2726 -259.3489 -278.3580 -2.5834 -2.6369
0.57 0.2093 200 0.5571 0.5922 -0.1676 0.7540 0.7598 -261.8945 -276.0320 -2.4867 -2.5465
0.5429 0.3140 300 0.5366 0.0019 -0.9625 0.7540 0.9644 -269.8440 -281.9351 -2.3542 -2.4208
0.5168 0.4186 400 0.5452 0.1591 -0.8845 0.7599 1.0436 -269.0635 -280.3629 -2.4760 -2.5389
0.5337 0.5233 500 0.5324 0.1371 -1.0631 0.7778 1.2002 -270.8497 -280.5833 -2.4225 -2.4845
0.5163 0.6279 600 0.5369 -0.3785 -1.5394 0.7560 1.1609 -275.6129 -285.7394 -2.4333 -2.4912
0.4881 0.7326 700 0.5380 0.1243 -1.2129 0.7679 1.3371 -272.3477 -280.7114 -2.3892 -2.4505
0.49 0.8373 800 0.5411 0.1149 -1.0375 0.7639 1.1524 -270.5944 -280.8054 -2.4479 -2.5044
0.5097 0.9419 900 0.5622 -0.2002 -1.4670 0.7698 1.2668 -274.8889 -283.9564 -2.5298 -2.5820
0.1144 1.0466 1000 0.5714 -0.2947 -1.8774 0.7639 1.5826 -278.9927 -284.9014 -2.5495 -2.6080
0.087 1.1512 1100 0.5960 -0.6932 -2.6301 0.7837 1.9369 -286.5200 -288.8864 -2.5036 -2.5699
0.1122 1.2559 1200 0.6133 -1.5655 -3.6620 0.7540 2.0965 -296.8384 -297.6089 -2.4063 -2.4765
0.1303 1.3605 1300 0.6040 -1.7575 -3.6828 0.7837 1.9252 -297.0464 -299.5291 -2.3747 -2.4470
0.0884 1.4652 1400 0.6035 -1.4203 -3.2606 0.7798 1.8403 -292.8251 -296.1571 -2.3840 -2.4553
0.0807 1.5699 1500 0.6033 -1.8277 -3.9141 0.7877 2.0864 -299.3599 -300.2314 -2.3962 -2.4731
0.1027 1.6745 1600 0.6157 -1.3414 -3.3683 0.7857 2.0269 -293.9024 -295.3680 -2.3746 -2.4536
0.0989 1.7792 1700 0.6009 -1.4146 -3.5889 0.7917 2.1744 -296.1083 -296.0996 -2.3750 -2.4548
0.0945 1.8838 1800 0.6109 -1.1285 -3.3269 0.7877 2.1984 -293.4879 -293.2390 -2.4051 -2.4825
0.0789 1.9885 1900 0.6093 -1.9115 -4.0587 0.7837 2.1472 -300.8062 -301.0694 -2.3968 -2.4730
0.0086 2.0931 2000 0.7414 -2.9121 -5.9384 0.7758 3.0263 -319.6029 -311.0746 -2.2016 -2.2928
0.0137 2.1978 2100 0.8116 -4.6780 -8.1860 0.7679 3.5080 -342.0789 -328.7336 -1.8924 -2.0338
0.0152 2.3025 2200 0.8371 -5.0993 -8.7589 0.7679 3.6596 -347.8080 -332.9471 -1.8207 -1.9887
0.0062 2.4071 2300 0.8704 -6.2532 -10.1416 0.7679 3.8884 -361.6346 -344.4856 -1.5897 -1.8086
0.0124 2.5118 2400 0.8848 -5.6604 -9.6724 0.7698 4.0120 -356.9429 -338.5582 -1.5561 -1.7751
0.0078 2.6164 2500 0.8926 -6.1681 -10.2415 0.7679 4.0734 -362.6336 -343.6352 -1.4181 -1.6590
0.0083 2.7211 2600 0.9002 -6.5323 -10.6541 0.7659 4.1218 -366.7602 -347.2773 -1.3929 -1.6493
0.0115 2.8257 2700 0.9076 -6.4271 -10.6033 0.7639 4.1762 -366.2516 -346.2245 -1.4047 -1.6632
0.0134 2.9304 2800 0.9106 -6.3982 -10.5970 0.7639 4.1988 -366.1889 -345.9361 -1.3900 -1.6525

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.1.2
  • Datasets 2.19.1
  • Tokenizers 0.19.1