Edit model card

zephyr-7b-gpo-u3-i1

This model is a fine-tuned version of DUAL-GPO/zephyr-7b-gpo-update3-i0 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0976
  • Rewards/chosen: -0.2046
  • Rewards/rejected: -0.1684
  • Rewards/accuracies: 0.3440
  • Rewards/margins: -0.0362
  • Logps/rejected: -271.7846
  • Logps/chosen: -287.1580
  • Logits/rejected: -1.8253
  • Logits/chosen: -1.9851

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.3803 0.4 100 0.0537 0.0 0.0 0.0 0.0 -254.9398 -266.6976 -1.8067 -1.9618
0.2732 0.8 200 0.0585 -0.0406 -0.0433 0.4405 0.0028 -259.2744 -270.7553 -1.8367 -1.9952
0.3013 1.2 300 0.0800 -0.3312 -0.3632 0.4645 0.0319 -291.2575 -299.8226 -1.8131 -1.9752
0.3433 1.6 400 0.0812 -0.3364 -0.3695 0.4675 0.0331 -291.8892 -300.3361 -1.8102 -1.9721
0.3606 2.0 500 0.1100 -0.3181 -0.2920 0.3735 -0.0262 -284.1371 -298.5123 -1.8348 -1.9970
0.3038 2.4 600 0.1092 -0.3233 -0.2979 0.3770 -0.0254 -284.7261 -299.0256 -1.8317 -1.9936
0.3161 2.8 700 0.1069 -0.3172 -0.2929 0.3800 -0.0243 -284.2322 -298.4158 -1.8345 -1.9966
0.3852 3.2 800 0.0918 -0.2304 -0.2057 0.3685 -0.0247 -275.5103 -289.7388 -1.8409 -2.0019
0.3359 3.6 900 0.0983 -0.2063 -0.1696 0.3430 -0.0368 -271.8958 -287.3323 -1.8240 -1.9838
0.3701 4.0 1000 0.0982 -0.2062 -0.1693 0.3455 -0.0368 -271.8734 -287.3159 -1.8241 -1.9838
0.4025 4.4 1100 0.0975 -0.2047 -0.1687 0.3455 -0.0359 -271.8127 -287.1649 -1.8260 -1.9858
0.3754 4.8 1200 0.0974 -0.2044 -0.1685 0.3440 -0.0359 -271.7890 -287.1331 -1.8256 -1.9853

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train DUAL-GPO/zephyr-7b-gpo-u3-i1