Edit model card

zephyr-7b-gpo-log-i1

This model is a fine-tuned version of DUAL-GPO/zephyr-7b-gpo-log-i0 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7084
  • Rewards/chosen: -0.3387
  • Rewards/rejected: -0.3762
  • Rewards/accuracies: 0.4641
  • Rewards/margins: 0.0375
  • Logps/rejected: -284.1953
  • Logps/chosen: -296.7821
  • Logits/rejected: -1.6524
  • Logits/chosen: -1.8037

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 12
  • total_eval_batch_size: 6
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6748 0.04 200 0.7007 -0.3675 -0.3814 0.4446 0.0139 -284.7155 -299.6654 -1.8001 -1.9625
0.6724 0.08 400 0.7027 -0.3184 -0.3527 0.4940 0.0344 -281.8482 -294.7475 -1.7890 -1.9524
0.6749 0.12 600 0.7100 -0.3255 -0.3594 0.4760 0.0339 -282.5139 -295.4615 -1.6820 -1.8358
0.6719 0.16 800 0.7050 -0.3022 -0.3372 0.4775 0.0350 -280.2988 -293.1357 -1.7259 -1.8834
0.6777 0.2 1000 0.7025 -0.2948 -0.3142 0.4461 0.0194 -277.9926 -292.3886 -1.7123 -1.8681
0.6724 0.24 1200 0.7089 -0.4249 -0.4720 0.4865 0.0471 -293.7763 -305.4027 -1.7346 -1.8939
0.6763 0.28 1400 0.7065 -0.3751 -0.4179 0.4746 0.0428 -288.3666 -300.4254 -1.6995 -1.8560
0.6729 0.32 1600 0.7084 -0.3379 -0.3600 0.4641 0.0221 -282.5755 -296.7008 -1.7340 -1.8920
0.6734 0.36 1800 0.7037 -0.3077 -0.3258 0.4521 0.0182 -279.1587 -293.6775 -1.7089 -1.8649
0.6754 0.4 2000 0.7073 -0.4076 -0.4418 0.4671 0.0342 -290.7584 -303.6719 -1.7361 -1.8949
0.679 0.44 2200 0.7075 -0.4434 -0.4787 0.4611 0.0353 -294.4463 -307.2497 -1.6814 -1.8362
0.6692 0.48 2400 0.7067 -0.3067 -0.3478 0.4716 0.0411 -281.3559 -293.5765 -1.6761 -1.8305
0.6778 0.52 2600 0.7036 -0.2610 -0.2905 0.4626 0.0294 -275.6222 -289.0128 -1.7120 -1.8687
0.6687 0.56 2800 0.7113 -0.4071 -0.4423 0.4626 0.0353 -290.8080 -303.6171 -1.6930 -1.8484
0.6741 0.6 3000 0.7067 -0.3261 -0.3614 0.4671 0.0354 -282.7206 -295.5167 -1.6692 -1.8222
0.674 0.64 3200 0.7085 -0.3171 -0.3556 0.4716 0.0384 -282.1313 -294.6258 -1.6840 -1.8385
0.6712 0.68 3400 0.7083 -0.3545 -0.3873 0.4626 0.0329 -285.3080 -298.3568 -1.6600 -1.8125
0.6738 0.72 3600 0.7078 -0.4016 -0.4475 0.4805 0.0458 -291.3219 -303.0744 -1.6368 -1.7870
0.6748 0.76 3800 0.7085 -0.3558 -0.4037 0.4746 0.0478 -286.9418 -298.4960 -1.6370 -1.7875
0.6746 0.8 4000 0.7097 -0.3549 -0.3943 0.4641 0.0394 -286.0046 -298.4026 -1.6465 -1.7977
0.6772 0.84 4200 0.7088 -0.3280 -0.3650 0.4611 0.0369 -283.0742 -295.7155 -1.6640 -1.8161
0.6718 0.88 4400 0.7082 -0.3267 -0.3617 0.4566 0.0349 -282.7410 -295.5824 -1.6550 -1.8062
0.6737 0.92 4600 0.7085 -0.3416 -0.3797 0.4656 0.0381 -284.5475 -297.0699 -1.6499 -1.8009
0.6742 0.96 4800 0.7085 -0.3387 -0.3765 0.4716 0.0378 -284.2217 -296.7780 -1.6508 -1.8018
0.6708 1.0 5000 0.7084 -0.3387 -0.3762 0.4641 0.0375 -284.1953 -296.7821 -1.6524 -1.8037

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for DUAL-GPO/zephyr-7b-gpo-log-i1

Adapter
(1172)
this model

Dataset used to train DUAL-GPO/zephyr-7b-gpo-log-i1