Edit model card

zephyr-7b-dpo-full-ultrabin-reward-scale-05

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5419
  • Rewards/chosen: -2.0652
  • Rewards/rejected: -3.5533
  • Rewards/accuracies: 0.7812
  • Rewards/margins: 1.4880
  • Logps/rejected: -617.9895
  • Logps/chosen: -469.1545
  • Logits/rejected: 3.0313
  • Logits/chosen: 2.1931

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 55
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6746 0.1046 50 0.6514 0.0215 -0.0844 0.6953 0.1059 -271.1068 -260.4849 -2.5751 -2.6121
0.5801 0.2092 100 0.5963 -1.2413 -2.0024 0.6914 0.7611 -462.9021 -386.7607 0.8478 0.5614
0.561 0.3138 150 0.5612 -1.3516 -2.3053 0.7422 0.9537 -493.1910 -397.7852 2.1227 1.6750
0.552 0.4184 200 0.5634 -1.7910 -3.0147 0.7539 1.2237 -564.1274 -441.7259 2.6771 2.0183
0.5367 0.5230 250 0.5404 -1.6069 -2.8715 0.7656 1.2646 -549.8127 -423.3247 2.8098 2.1736
0.5231 0.6276 300 0.5511 -1.8243 -3.2523 0.7656 1.4280 -587.8877 -445.0558 2.9864 2.2075
0.5092 0.7322 350 0.5402 -1.9840 -3.4024 0.7734 1.4184 -602.9061 -461.0307 2.8834 2.0946
0.5231 0.8368 400 0.5417 -2.0950 -3.5645 0.7812 1.4695 -619.1116 -472.1271 3.0542 2.2365
0.5232 0.9414 450 0.5419 -2.0657 -3.5528 0.7812 1.4871 -617.9430 -469.2008 3.0322 2.1926

Framework versions

  • Transformers 4.44.0.dev0
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for sfulay/zephyr-7b-dpo-full-ultrabin-reward-scale-05

Finetuned
(283)
this model

Dataset used to train sfulay/zephyr-7b-dpo-full-ultrabin-reward-scale-05