Edit model card

tinyllama_moe_dpo_ultrachat_v2_epochs5

This model is a fine-tuned version of ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs5 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5739
  • Rewards/chosen: -1.1929
  • Rewards/rejected: -1.7842
  • Rewards/accuracies: 0.7163
  • Rewards/margins: 0.5913
  • Logps/rejected: -486.3180
  • Logps/chosen: -468.6473
  • Logits/rejected: -1.7313
  • Logits/chosen: -1.8442

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 96
  • num_epochs: 5

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6913 0.1 100 -2.7889 -2.7179 -348.8463 -307.7887 0.6915 0.6012 0.0051 0.0041 0.0011
0.6848 0.21 200 -2.7786 -2.7064 -347.1148 -307.7814 0.6844 0.6548 0.0224 0.0213 0.0011
0.6719 0.31 300 -2.7564 -2.6828 -347.1926 -310.3274 0.6745 0.6567 0.0217 0.0460 -0.0243
0.6593 0.42 400 -2.7168 -2.6417 -351.2079 -317.7508 0.6626 0.6627 -0.0185 0.0801 -0.0985
0.6489 0.52 500 -2.6766 -2.5996 -359.7169 -330.5644 0.6503 0.6667 -0.1036 0.1231 -0.2267
0.6442 0.63 600 -2.6209 -2.5415 -364.4345 -339.3099 0.6407 0.6806 -0.1507 0.1634 -0.3141
0.6271 0.73 700 -2.5658 -2.4836 -373.3324 -352.5069 0.6321 0.6766 -0.2397 0.2064 -0.4461
0.607 0.84 800 -2.5051 -2.4199 -379.1497 -361.6935 0.6261 0.6845 -0.2979 0.2401 -0.5380
0.6322 0.94 900 -2.4508 -2.3644 -397.4641 -382.2142 0.6199 0.6905 -0.4810 0.2621 -0.7432
0.605 1.05 1000 -2.3964 -2.3068 -404.5890 -394.0288 0.6115 0.6885 -0.5523 0.3090 -0.8613
0.601 1.15 1100 -2.3602 -2.2683 -418.7677 -411.0065 0.6068 0.6964 -0.6941 0.3370 -1.0311
0.5676 1.26 1200 -2.3216 -2.2290 -417.0859 -411.9764 0.6020 0.7123 -0.6773 0.3635 -1.0408
0.5909 1.36 1300 -2.2912 -2.1982 -412.9470 -408.3128 0.5999 0.7123 -0.6359 0.3683 -1.0042
0.5711 1.47 1400 -2.2460 -2.1507 -420.5697 -419.0722 0.5967 0.7183 -0.7121 0.3997 -1.1118
0.5655 1.57 1500 -2.2212 -2.1253 -412.4961 -410.0143 0.5957 0.7222 -0.6314 0.3898 -1.0212
0.5655 1.67 1600 -2.1858 -2.0877 -414.4090 -414.7852 0.5925 0.7242 -0.6505 0.4184 -1.0689
0.5364 1.78 1700 -2.1499 -2.0500 -425.4825 -428.4342 0.5873 0.7262 -0.7612 0.4442 -1.2054
0.5702 1.88 1800 -2.1546 -2.0539 -424.3879 -429.0814 0.5843 0.7361 -0.7503 0.4616 -1.2119
0.5505 1.99 1900 -2.1340 -2.0328 -413.9261 -417.8120 0.5852 0.7321 -0.6457 0.4535 -1.0992
0.5389 2.09 2000 -2.0806 -1.9769 -422.3402 -427.3939 0.5828 0.7262 -0.7298 0.4652 -1.1950
0.531 2.2 2100 -2.0565 -1.9511 -437.7683 -446.1322 0.5805 0.7341 -0.8841 0.4983 -1.3824
0.5162 2.3 2200 -2.0180 -1.9112 -435.0022 -443.4644 0.5830 0.7341 -0.8564 0.4993 -1.3557
0.5297 2.41 2300 -1.9911 -1.8838 -448.7519 -459.4124 0.5795 0.7183 -0.9939 0.5212 -1.5152
0.5143 2.51 2400 -1.9853 -1.8784 -436.2057 -445.7617 0.5806 0.7321 -0.8685 0.5102 -1.3787
0.5377 2.62 2500 -1.9648 -1.8572 -443.1574 -454.7680 0.5786 0.7282 -0.9380 0.5307 -1.4687
0.4868 2.72 2600 -1.9504 -1.8416 -439.4379 -450.5156 0.5797 0.7302 -0.9008 0.5254 -1.4262
0.5275 2.83 2700 -1.9219 -1.8117 -447.6714 -460.6927 0.5754 0.7282 -0.9831 0.5448 -1.5280
0.5042 2.93 2800 -1.9484 -1.8401 -447.7928 -460.8577 0.5743 0.7321 -0.9843 0.5453 -1.5296
0.4862 3.04 2900 -1.9315 -1.8216 -452.8863 -467.0351 0.5756 0.7202 -1.0353 0.5561 -1.5914
0.4817 3.14 3000 -1.8836 -1.7716 -453.8664 -469.6034 0.5786 0.7282 -1.0451 0.5720 -1.6171
0.4767 3.24 3100 -1.8663 -1.7538 -457.4258 -472.9984 0.5770 0.7262 -1.0807 0.5704 -1.6510
0.4794 3.35 3200 -1.8515 -1.7384 -460.2550 -476.8743 0.5789 0.7262 -1.1090 0.5808 -1.6898
0.4784 3.46 3300 0.5739 -1.1929 -1.7842 0.7163 0.5913 -486.3180 -468.6473 -1.7313 -1.8442
0.4797 3.56 3400 0.5754 -1.1487 -1.7306 0.7202 0.5819 -480.9566 -464.2336 -1.7340 -1.8464
0.4967 3.66 3500 0.5763 -1.1304 -1.7077 0.7282 0.5773 -478.6690 -462.4030 -1.7331 -1.8458
0.4747 3.77 3600 0.5767 -1.1301 -1.7168 0.7262 0.5867 -479.5741 -462.3710 -1.7268 -1.8402
0.4895 3.87 3700 0.5747 -1.1393 -1.7177 0.7202 0.5784 -479.6691 -463.2915 -1.7302 -1.8430
0.5118 3.98 3800 0.5743 -1.1478 -1.7342 0.7262 0.5864 -481.3118 -464.1390 -1.7282 -1.8417
0.5007 4.08 3900 0.5753 -1.1349 -1.7215 0.7282 0.5866 -480.0436 -462.8507 -1.7269 -1.8403
0.461 4.19 4000 0.5745 -1.1675 -1.7563 0.7222 0.5888 -483.5273 -466.1142 -1.7189 -1.8327
0.4881 4.29 4100 0.5762 -1.1482 -1.7395 0.7282 0.5913 -481.8481 -464.1829 -1.7124 -1.8260
0.4449 4.4 4200 0.5765 -1.1678 -1.7615 0.7202 0.5937 -484.0506 -466.1421 -1.7116 -1.8251
0.4692 4.5 4300 0.5759 -1.1710 -1.7620 0.7242 0.5910 -484.0968 -466.4624 -1.7143 -1.8279
0.4654 4.61 4400 0.5760 -1.1694 -1.7633 0.7262 0.5939 -484.2224 -466.3009 -1.7154 -1.8290
0.4608 4.71 4500 0.5754 -1.1765 -1.7692 0.7202 0.5926 -484.8123 -467.0131 -1.7171 -1.8304
0.4661 4.82 4600 0.5754 -1.1819 -1.7750 0.7282 0.5931 -485.3937 -467.5481 -1.7120 -1.8255
0.4859 4.92 4700 0.5756 -1.1834 -1.7761 0.7202 0.5927 -485.5031 -467.6952 -1.7101 -1.8237

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu118
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
6.43B params
Tensor type
BF16
·

Finetuned from

Dataset used to train ondevicellm/tinyllama_moe_dpo_ultrachat_v2_epochs5