zephyr-7b-dpo-full / README.md
wzhouad's picture
Model save
a914407 verified
|
raw
history blame
No virus
6.37 kB
metadata
license: mit
base_model: HuggingFaceH4/mistral-7b-sft-beta
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of HuggingFaceH4/mistral-7b-sft-beta on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0108
  • Rewards/chosen: -5.9141
  • Rewards/rejected: -7.7338
  • Rewards/accuracies: 0.7266
  • Rewards/margins: 1.8197
  • Logps/rejected: -1030.7371
  • Logps/chosen: -848.4521
  • Logits/rejected: -1.6334
  • Logits/chosen: -1.6493

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.2786 0.21 100 0.2781 -0.0080 -0.0710 0.6719 0.0631 -264.4583 -257.8367 -2.7623 -2.7774
0.1377 0.42 200 0.1450 -0.5817 -1.0385 0.6992 0.4567 -361.2018 -315.2145 -2.7365 -2.7512
0.1162 0.63 300 0.1186 -1.0407 -1.6725 0.7266 0.6318 -424.5983 -361.1053 -2.4888 -2.5058
0.1019 0.84 400 0.0997 -1.6327 -2.4828 0.7461 0.8501 -505.6364 -420.3094 -2.2736 -2.3013
0.0226 1.05 500 0.0406 -2.9554 -4.2565 0.7266 1.3012 -683.0034 -552.5746 -2.1929 -2.2303
0.0116 1.26 600 0.0298 -3.0110 -4.3717 0.7305 1.3607 -694.5244 -558.1376 -2.1365 -2.1643
0.0132 1.46 700 0.0320 -2.8731 -4.1217 0.7383 1.2486 -669.5266 -544.3542 -2.1173 -2.1453
0.0141 1.67 800 0.0285 -2.8506 -4.0446 0.7383 1.1939 -661.8126 -542.1040 -2.0387 -2.0557
0.008 1.88 900 0.0217 -3.7087 -4.9874 0.7148 1.2786 -756.0888 -627.9131 -1.8927 -1.9084
0.0015 2.09 1000 0.0135 -4.8936 -6.4137 0.7109 1.5202 -898.7281 -746.3977 -1.7007 -1.7103
0.0019 2.3 1100 0.0140 -4.8675 -6.4410 0.7188 1.5735 -901.4539 -743.7909 -1.7341 -1.7490
0.0014 2.51 1200 0.0128 -5.1432 -6.7584 0.7188 1.6152 -933.1906 -771.3603 -1.7194 -1.7313
0.0012 2.72 1300 0.0126 -5.2094 -6.8051 0.7227 1.5957 -937.8638 -777.9802 -1.7283 -1.7387
0.0012 2.93 1400 0.0126 -5.3124 -6.9529 0.7148 1.6405 -952.6434 -788.2790 -1.7056 -1.7185
0.0009 3.14 1500 0.0113 -5.6394 -7.3683 0.7188 1.7289 -994.1813 -820.9806 -1.6707 -1.6834
0.0007 3.35 1600 0.0115 -5.6409 -7.3656 0.7227 1.7247 -993.9130 -821.1270 -1.6691 -1.6823
0.0011 3.56 1700 0.0114 -5.6893 -7.4555 0.7227 1.7662 -1002.9027 -825.9682 -1.6580 -1.6727
0.0007 3.77 1800 0.0113 -5.7534 -7.5287 0.7227 1.7753 -1010.2194 -832.3766 -1.6467 -1.6620
0.0009 3.97 1900 0.0113 -5.7308 -7.5090 0.7227 1.7782 -1008.2513 -830.1171 -1.6581 -1.6731
0.0006 4.18 2000 0.0109 -5.8887 -7.6915 0.7266 1.8028 -1026.5013 -845.9089 -1.6381 -1.6538
0.0006 4.39 2100 0.0109 -5.9096 -7.7239 0.7266 1.8144 -1029.7469 -847.9958 -1.6345 -1.6501
0.0006 4.6 2200 0.0109 -5.8953 -7.7105 0.7266 1.8152 -1028.4065 -846.5691 -1.6360 -1.6516
0.0007 4.81 2300 0.0108 -5.9141 -7.7338 0.7266 1.8197 -1030.7371 -848.4521 -1.6334 -1.6493

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1