zephyr-7b-dpo-full / README.md
RikkiXu's picture
Model save
6f9ad2c verified
|
raw
history blame
No virus
4.29 kB
metadata
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5366
  • Rewards/chosen: -2.9738
  • Rewards/rejected: -4.4991
  • Rewards/accuracies: 0.7617
  • Rewards/margins: 1.5252
  • Logps/rejected: -767.4317
  • Logps/chosen: -609.1594
  • Logits/rejected: 1.6095
  • Logits/chosen: 0.9559

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5905 0.07 100 0.6429 -0.1380 -0.3441 0.6719 0.2061 -351.9318 -325.5744 -1.7244 -1.7878
0.4495 0.15 200 0.5600 -0.4940 -1.0973 0.7461 0.6032 -427.2510 -361.1815 -1.3665 -1.4371
0.3963 0.22 300 0.5291 -1.1123 -2.0359 0.7422 0.9236 -521.1155 -423.0034 -1.2770 -1.4609
0.4012 0.3 400 0.5315 -1.0588 -1.9923 0.7734 0.9334 -516.7505 -417.6586 -1.1223 -1.3373
0.3559 0.37 500 0.5276 -1.4423 -2.5146 0.7578 1.0723 -568.9822 -456.0086 -0.6834 -1.0067
0.3291 0.45 600 0.5103 -1.6617 -2.7811 0.7695 1.1194 -595.6332 -477.9445 0.1886 -0.2334
0.2735 0.52 700 0.5289 -2.2950 -3.7006 0.7617 1.4056 -687.5872 -541.2795 0.6722 0.1870
0.2752 0.59 800 0.5229 -2.2134 -3.5070 0.7656 1.2935 -668.2236 -533.1202 0.2752 -0.1628
0.2492 0.67 900 0.5152 -2.0646 -3.3529 0.7734 1.2882 -652.8116 -518.2382 1.0726 0.5184
0.262 0.74 1000 0.5241 -2.4505 -3.8564 0.7617 1.4059 -703.1603 -556.8265 1.3124 0.6805
0.2299 0.82 1100 0.5313 -2.7647 -4.2433 0.7578 1.4786 -741.8574 -588.2495 1.4834 0.8391
0.1974 0.89 1200 0.5367 -2.9484 -4.4713 0.7617 1.5229 -764.6512 -606.6174 1.5458 0.8964
0.1842 0.97 1300 0.5366 -2.9738 -4.4991 0.7617 1.5252 -767.4317 -609.1594 1.6095 0.9559

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.2+cu118
  • Datasets 2.16.1
  • Tokenizers 0.15.2