zephyr-7b-dpo-full / README.md
RikkiXu's picture
Model save
eb7e456 verified
metadata
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5418
  • Rewards/chosen: -3.1726
  • Rewards/rejected: -4.7390
  • Rewards/accuracies: 0.7539
  • Rewards/margins: 1.5664
  • Logps/rejected: -761.6608
  • Logps/chosen: -598.8974
  • Logits/rejected: 0.2389
  • Logits/chosen: -0.0634

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6142 0.07 100 0.6372 -0.2112 -0.4255 0.6992 0.2143 -330.3116 -302.7545 -1.7521 -1.7871
0.4726 0.15 200 0.5516 -1.3441 -2.1046 0.75 0.7605 -498.2208 -416.0410 -2.0018 -2.0471
0.4421 0.22 300 0.5335 -1.1470 -2.0463 0.7539 0.8992 -492.3901 -396.3379 -1.7522 -1.8325
0.3828 0.3 400 0.5238 -1.6652 -2.7617 0.7695 1.0965 -563.9280 -448.1488 -0.9530 -1.1204
0.3576 0.37 500 0.5184 -1.6238 -2.7277 0.7695 1.1039 -560.5328 -444.0173 -0.8922 -1.1202
0.3328 0.45 600 0.5151 -2.1202 -3.4092 0.7656 1.2890 -628.6859 -493.6552 0.2423 -0.0694
0.3131 0.52 700 0.5153 -1.7034 -2.9038 0.7656 1.2004 -578.1398 -451.9696 0.1729 -0.1656
0.2547 0.59 800 0.5256 -2.5366 -3.8570 0.7617 1.3204 -673.4565 -535.2915 0.4476 0.1270
0.2764 0.67 900 0.5221 -2.5675 -3.9457 0.7773 1.3782 -682.3342 -538.3813 0.0520 -0.2431
0.2261 0.74 1000 0.5298 -2.7657 -4.2499 0.7695 1.4842 -712.7483 -558.2006 0.2023 -0.1104
0.2219 0.82 1100 0.5380 -3.0986 -4.6646 0.7695 1.5660 -754.2211 -591.4904 0.3078 -0.0067
0.2165 0.89 1200 0.5336 -2.9855 -4.5026 0.7617 1.5170 -738.0179 -580.1855 0.2015 -0.0980
0.1728 0.97 1300 0.5418 -3.1726 -4.7390 0.7539 1.5664 -761.6608 -598.8974 0.2389 -0.0634

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.2+cu118
  • Datasets 2.16.1
  • Tokenizers 0.15.2