Edit model card

zephyr-7b

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6918
  • Rewards/chosen: -0.0862
  • Rewards/rejected: -0.1980
  • Rewards/accuracies: 0.3591
  • Rewards/margins: 0.1117
  • Logps/rejected: -95.1937
  • Logps/chosen: -77.5232
  • Logits/rejected: -1.9123
  • Logits/chosen: -1.9402
  • Use Label: 15333.4131
  • Pred Label: 4738.5874

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Use Label Pred Label
0.6876 0.1 100 0.6896 -0.0555 -0.0989 0.3353 0.0434 -85.2883 -74.4495 -2.0761 -2.1076 1766.8572 89.1429
0.6892 0.21 200 0.6894 -0.0049 -0.0560 0.3492 0.0511 -80.9954 -69.3876 -2.0287 -2.0520 3500.8889 459.1111
0.6904 0.31 300 0.6909 -0.0625 -0.1410 0.3532 0.0785 -89.5016 -75.1524 -1.9943 -2.0164 5140.6826 923.3174
0.6906 0.42 400 0.6921 -0.0637 -0.1541 0.3512 0.0904 -90.8064 -75.2687 -2.0248 -2.0481 6695.4287 1472.5714
0.6903 0.52 500 0.6914 -0.0747 -0.1726 0.3492 0.0979 -92.6561 -76.3697 -1.9801 -2.0071 8246.2061 2025.7937
0.6903 0.63 600 0.6917 -0.1005 -0.2047 0.3552 0.1042 -95.8670 -78.9543 -1.9601 -1.9870 9772.0635 2603.9365
0.6917 0.73 700 0.6917 -0.1117 -0.2224 0.3512 0.1108 -97.6411 -80.0681 -1.9401 -1.9659 11284.7773 3195.2222
0.6912 0.84 800 0.6917 -0.0869 -0.1981 0.3631 0.1112 -95.2089 -77.5874 -1.9144 -1.9422 12826.8252 3757.1746
0.6914 0.94 900 0.6918 -0.0863 -0.1983 0.3571 0.1120 -95.2291 -77.5275 -1.9113 -1.9391 14335.7139 4352.2856

Framework versions

  • PEFT 0.7.1
  • Transformers 4.38.2
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train jikaixuan/zephyr-7b