zephyr-7b / README.md
jikaixuan's picture
End of training
bdfa771 verified
|
raw
history blame
No virus
4.14 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: zephyr-7b
    results: []

zephyr-7b

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6906
  • Rewards/chosen: -0.3413
  • Rewards/rejected: -0.5652
  • Rewards/accuracies: 0.3631
  • Rewards/margins: 0.2239
  • Logps/rejected: -131.9189
  • Logps/chosen: -103.0295
  • Logits/rejected: -0.1381
  • Logits/chosen: -0.2453
  • Use Label: 15879.8574
  • Pred Label: 4192.1431

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Use Label Pred Label
0.6818 0.1 100 0.6814 -0.0056 -0.0496 0.3393 0.0440 -80.3582 -69.4632 -2.0664 -2.0975 1833.4603 22.5397
0.6818 0.21 200 0.6861 -0.1358 -0.2381 0.3373 0.1023 -99.2068 -82.4782 -1.9938 -2.0215 3701.2063 258.7936
0.6848 0.31 300 0.6877 -0.2068 -0.3388 0.3413 0.1320 -109.2766 -89.5763 -1.8828 -1.9157 5437.8730 626.1270
0.6857 0.42 400 0.6885 -0.1802 -0.3299 0.3532 0.1497 -108.3913 -86.9237 -1.4031 -1.4529 7112.4443 1055.5555
0.6894 0.52 500 0.6892 -0.2862 -0.4559 0.3552 0.1697 -120.9922 -97.5203 -0.5997 -0.6889 8741.4287 1530.5714
0.6881 0.63 600 0.6918 -0.3826 -0.6059 0.3532 0.2233 -135.9845 -107.1618 -0.2548 -0.3579 10293.6826 2082.3174
0.6913 0.73 700 0.6899 -0.3542 -0.5787 0.3671 0.2244 -133.2637 -104.3247 -0.2462 -0.3470 11806.4766 2673.5239
0.6893 0.84 800 0.6904 -0.3443 -0.5684 0.3631 0.2241 -132.2416 -103.3355 -0.1293 -0.2367 13331.9043 3252.0952
0.689 0.94 900 0.6907 -0.3413 -0.5651 0.3631 0.2238 -131.9111 -103.0301 -0.1367 -0.2437 14866.4766 3821.5239

Framework versions

  • PEFT 0.7.1
  • Transformers 4.38.2
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2