Edit model card

pythia-1.4b-dpo-full

This model is a fine-tuned version of nnheui/pythia-1.4b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5991
  • Rewards/chosen: -1.875
  • Rewards/rejected: -2.6406
  • Rewards/accuracies: 0.7164
  • Rewards/margins: 0.7734
  • Logps/rejected: -604.0
  • Logps/chosen: -580.0
  • Logits/rejected: -1.4297
  • Logits/chosen: -1.4062
  • Logps/chosen Top Tokens: -0.0009
  • Logps/rejected Top Tokens: -0.0009
  • Logps/chosen Bottom Tokens: -13.9375
  • Logps/rejected Bottom Tokens: -13.8125

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 5
  • eval_batch_size: 5
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 120
  • total_eval_batch_size: 30
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/chosen Bottom Tokens Logps/chosen Top Tokens Logps/rejected Logps/rejected Bottom Tokens Logps/rejected Top Tokens Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.678 0.1963 100 -1.0938 -1.1562 -396.0 -14.0625 -0.0009 -344.0 -14.0 -0.0009 0.6789 0.5881 -0.0275 0.0332 -0.0608
0.645 0.3925 200 -1.1562 -1.2031 -422.0 -14.375 -0.0009 -380.0 -14.3125 -0.0009 0.6489 0.6448 -0.2871 0.1367 -0.4238
0.6396 0.5888 300 -1.1875 -1.2344 -438.0 -14.375 -0.0007 -406.0 -14.3125 -0.0008 0.6304 0.6627 -0.4512 0.2275 -0.6797
0.6102 0.7851 400 -1.1875 -1.2344 -444.0 -14.3125 -0.0007 -414.0 -14.25 -0.0007 0.6268 0.6567 -0.5039 0.2578 -0.7617
0.6084 0.9814 500 -1.1953 -1.2422 -446.0 -14.375 -0.0007 -416.0 -14.3125 -0.0007 0.6259 0.6567 -0.5234 0.2617 -0.7852
0.6115 1.1776 600 0.6121 -0.5547 -0.8789 0.6806 0.3242 -426.0 -450.0 -1.2578 -1.2109 -0.0006 -0.0006 -14.25 -14.125
0.607 1.3739 700 0.6068 -0.6641 -1.0078 0.6985 0.3418 -438.0 -460.0 -1.2812 -1.2344 -0.0006 -0.0006 -14.1875 -14.125
0.5764 1.5702 800 0.5996 -0.75 -1.1406 0.6866 0.3887 -452.0 -468.0 -1.3125 -1.2656 -0.0007 -0.0007 -14.25 -14.125
0.5903 1.7664 900 0.5984 -0.5898 -0.9648 0.7045 0.3770 -434.0 -452.0 -1.3125 -1.2656 -0.0006 -0.0006 -14.25 -14.125
0.5697 1.9627 1000 0.5922 -0.7383 -1.1562 0.6866 0.4160 -454.0 -468.0 -1.3125 -1.2734 -0.0007 -0.0006 -14.0625 -14.0
0.5573 2.1590 1100 0.5854 -0.8203 -1.2812 0.6985 0.4570 -466.0 -476.0 -1.3281 -1.2891 -0.0006 -0.0006 -14.125 -14.0
0.5439 2.3553 1200 0.5845 -1.1016 -1.6172 0.6866 0.5078 -498.0 -504.0 -1.3672 -1.3281 -0.0007 -0.0006 -14.0625 -13.9375
0.5487 2.5515 1300 0.5801 -0.8906 -1.3828 0.6925 0.4980 -476.0 -482.0 -1.3828 -1.3438 -0.0007 -0.0006 -14.0625 -14.0
0.543 2.7478 1400 0.5785 -0.8672 -1.3516 0.7134 0.4863 -474.0 -480.0 -1.375 -1.3359 -0.0007 -0.0006 -14.0625 -13.9375
0.5382 2.9441 1500 0.5711 -1.1172 -1.6641 0.6955 0.5508 -504.0 -506.0 -1.3906 -1.3516 -0.0007 -0.0006 -14.125 -14.0
0.5117 3.1403 1600 0.5712 -1.25 -1.8281 0.7045 0.5742 -520.0 -520.0 -1.3984 -1.3594 -0.0007 -0.0006 -14.125 -14.0
0.4983 3.3366 1700 0.5703 -1.1641 -1.75 0.7015 0.5859 -512.0 -510.0 -1.4062 -1.3672 -0.0007 -0.0007 -14.125 -14.0
0.4976 3.5329 1800 0.5709 -1.2656 -1.8828 0.7254 0.6133 -524.0 -520.0 -1.4141 -1.375 -0.0007 -0.0007 -14.125 -14.0625
0.4956 3.7291 1900 0.5754 -1.2266 -1.8047 0.7164 0.5781 -516.0 -516.0 -1.4062 -1.3672 -0.0008 -0.0008 -14.0625 -13.9375
0.4996 3.9254 2000 0.5722 -1.2578 -1.8516 0.7045 0.6016 -524.0 -520.0 -1.4062 -1.375 -0.0008 -0.0008 -14.0625 -13.9375
0.4588 4.1217 2100 0.5748 -1.4141 -2.0312 0.7343 0.6211 -540.0 -536.0 -1.4062 -1.375 -0.0009 -0.0009 -14.0 -13.875
0.4555 4.3180 2200 0.5743 -1.2969 -1.9141 0.7164 0.6172 -528.0 -524.0 -1.4219 -1.3906 -0.0009 -0.0009 -13.9375 -13.8125
0.4625 4.5142 2300 0.5735 -1.3047 -1.9297 0.7134 0.625 -532.0 -524.0 -1.4141 -1.3828 -0.0008 -0.0008 -14.0 -13.875
0.469 4.7105 2400 0.5743 -1.4766 -2.1406 0.7194 0.6562 -552.0 -540.0 -1.4375 -1.3984 -0.0009 -0.0009 -14.0 -13.875
0.4796 4.9068 2500 0.5750 -1.3281 -1.9766 0.7134 0.6484 -536.0 -528.0 -1.4375 -1.3984 -0.0009 -0.0009 -14.0 -13.875
0.4082 5.1030 2600 0.5818 -1.6016 -2.2656 0.7194 0.6602 -564.0 -552.0 -1.4453 -1.4062 -0.0009 -0.0009 -14.0 -13.875
0.4193 5.2993 2700 0.5803 -1.4922 -2.1406 0.7194 0.6523 -552.0 -544.0 -1.4375 -1.3984 -0.0009 -0.0009 -14.0 -13.8125
0.419 5.4956 2800 0.5795 -1.625 -2.3281 0.7194 0.7031 -572.0 -556.0 -1.4375 -1.3984 -0.0009 -0.0009 -14.0 -13.875
0.4267 5.6919 2900 0.5780 -1.6875 -2.375 0.7134 0.6836 -576.0 -564.0 -1.4375 -1.4062 -0.0009 -0.0008 -13.9375 -13.8125
0.402 5.8881 3000 0.5828 -1.6484 -2.3594 0.7254 0.7109 -572.0 -560.0 -1.4453 -1.4062 -0.0009 -0.0009 -13.9375 -13.8125
0.3656 6.0844 3100 0.5844 -1.6875 -2.4062 0.7015 0.7227 -580.0 -564.0 -1.4375 -1.4062 -0.0009 -0.0009 -14.0 -13.875
0.3971 6.2807 3200 0.5873 -1.6094 -2.3281 0.7075 0.7148 -572.0 -556.0 -1.4453 -1.4141 -0.0009 -0.0009 -14.0 -13.8125
0.3923 6.4769 3300 0.5906 -1.6875 -2.4062 0.7075 0.7188 -580.0 -564.0 -1.4453 -1.4141 -0.0009 -0.0009 -14.0 -13.875
0.4011 6.6732 3400 0.5848 -1.7109 -2.4375 0.7254 0.7344 -584.0 -564.0 -1.4375 -1.4062 -0.0009 -0.0008 -14.0 -13.875
0.3838 6.8695 3500 0.5897 -1.75 -2.4844 0.7164 0.7305 -584.0 -568.0 -1.4297 -1.3984 -0.0009 -0.0008 -13.9375 -13.8125
0.3762 7.0658 3600 0.5910 -1.7812 -2.5312 0.7134 0.7422 -592.0 -572.0 -1.4375 -1.4062 -0.0009 -0.0008 -13.9375 -13.8125
0.3591 7.2620 3700 0.5895 -1.7812 -2.5312 0.7075 0.7578 -592.0 -572.0 -1.4375 -1.4062 -0.0009 -0.0009 -14.0 -13.875
0.3713 7.4583 3800 0.5956 -1.7734 -2.5312 0.7164 0.75 -592.0 -572.0 -1.4297 -1.3984 -0.0009 -0.0009 -13.9375 -13.8125
0.381 7.6546 3900 0.5948 -1.8672 -2.625 0.7164 0.7695 -600.0 -580.0 -1.4375 -1.4062 -0.0009 -0.0008 -13.9375 -13.8125
0.3639 7.8508 4000 0.5950 -1.8672 -2.625 0.7194 0.7578 -600.0 -580.0 -1.4375 -1.4062 -0.0009 -0.0009 -13.9375 -13.8125
0.3563 8.0471 4100 0.5939 -1.8281 -2.5781 0.7075 0.7539 -596.0 -576.0 -1.4297 -1.3984 -0.0009 -0.0009 -13.9375 -13.8125
0.3484 8.2434 4200 0.5969 -1.875 -2.6406 0.7045 0.7656 -600.0 -580.0 -1.4375 -1.4062 -0.0009 -0.0008 -14.0 -13.875
0.3359 8.4396 4300 0.5966 -1.8828 -2.6562 0.7045 0.7734 -604.0 -580.0 -1.4375 -1.4062 -0.0009 -0.0009 -13.9375 -13.8125
0.3639 8.6359 4400 0.5979 -1.8516 -2.5938 0.7075 0.7461 -596.0 -580.0 -1.4297 -1.3984 -0.0009 -0.0009 -13.9375 -13.8125
0.3563 8.8322 4500 0.5979 -1.8594 -2.625 0.7075 0.7617 -600.0 -580.0 -1.4297 -1.3984 -0.0009 -0.0009 -13.9375 -13.8125
0.353 9.0285 4600 0.5981 -1.8672 -2.625 0.6985 0.7617 -600.0 -580.0 -1.4297 -1.3984 -0.0009 -0.0008 -13.9375 -13.8125
0.3514 9.2247 4700 0.5979 -1.8594 -2.625 0.6985 0.7656 -600.0 -580.0 -1.4297 -1.3984 -0.0009 -0.0008 -13.9375 -13.8125
0.3434 9.4210 4800 0.5973 -1.8672 -2.6406 0.7015 0.7656 -600.0 -580.0 -1.4297 -1.4062 -0.0009 -0.0008 -13.9375 -13.8125
0.3492 9.6173 4900 0.5981 -1.875 -2.6406 0.7045 0.7578 -600.0 -580.0 -1.4297 -1.3984 -0.0009 -0.0008 -13.9375 -13.8125
0.3487 9.8135 5000 0.5967 -1.8672 -2.6406 0.7134 0.7734 -600.0 -580.0 -1.4375 -1.4062 -0.0009 -0.0008 -13.9375 -13.8125

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.2.2+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
1.41B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train nnheui/pythia-1.4b-dpo-full