--- license: apache-2.0 base_model: ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs5 tags: - alignment-handbook - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: tinyllama_moe_dpo_ultrachat_v2_epochs5 results: [] --- # tinyllama_moe_dpo_ultrachat_v2_epochs5 This model is a fine-tuned version of [ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs5](https://huggingface.co/ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs5) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.5739 - Rewards/chosen: -1.1929 - Rewards/rejected: -1.7842 - Rewards/accuracies: 0.7163 - Rewards/margins: 0.5913 - Logps/rejected: -486.3180 - Logps/chosen: -468.6473 - Logits/rejected: -1.7313 - Logits/chosen: -1.8442 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 96 - num_epochs: 5 ### Training results | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected | |:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:| | 0.6913 | 0.1 | 100 | -2.7889 | -2.7179 | -348.8463 | -307.7887 | 0.6915 | 0.6012 | 0.0051 | 0.0041 | 0.0011 | | 0.6848 | 0.21 | 200 | -2.7786 | -2.7064 | -347.1148 | -307.7814 | 0.6844 | 0.6548 | 0.0224 | 0.0213 | 0.0011 | | 0.6719 | 0.31 | 300 | -2.7564 | -2.6828 | -347.1926 | -310.3274 | 0.6745 | 0.6567 | 0.0217 | 0.0460 | -0.0243 | | 0.6593 | 0.42 | 400 | -2.7168 | -2.6417 | -351.2079 | -317.7508 | 0.6626 | 0.6627 | -0.0185 | 0.0801 | -0.0985 | | 0.6489 | 0.52 | 500 | -2.6766 | -2.5996 | -359.7169 | -330.5644 | 0.6503 | 0.6667 | -0.1036 | 0.1231 | -0.2267 | | 0.6442 | 0.63 | 600 | -2.6209 | -2.5415 | -364.4345 | -339.3099 | 0.6407 | 0.6806 | -0.1507 | 0.1634 | -0.3141 | | 0.6271 | 0.73 | 700 | -2.5658 | -2.4836 | -373.3324 | -352.5069 | 0.6321 | 0.6766 | -0.2397 | 0.2064 | -0.4461 | | 0.607 | 0.84 | 800 | -2.5051 | -2.4199 | -379.1497 | -361.6935 | 0.6261 | 0.6845 | -0.2979 | 0.2401 | -0.5380 | | 0.6322 | 0.94 | 900 | -2.4508 | -2.3644 | -397.4641 | -382.2142 | 0.6199 | 0.6905 | -0.4810 | 0.2621 | -0.7432 | | 0.605 | 1.05 | 1000 | -2.3964 | -2.3068 | -404.5890 | -394.0288 | 0.6115 | 0.6885 | -0.5523 | 0.3090 | -0.8613 | | 0.601 | 1.15 | 1100 | -2.3602 | -2.2683 | -418.7677 | -411.0065 | 0.6068 | 0.6964 | -0.6941 | 0.3370 | -1.0311 | | 0.5676 | 1.26 | 1200 | -2.3216 | -2.2290 | -417.0859 | -411.9764 | 0.6020 | 0.7123 | -0.6773 | 0.3635 | -1.0408 | | 0.5909 | 1.36 | 1300 | -2.2912 | -2.1982 | -412.9470 | -408.3128 | 0.5999 | 0.7123 | -0.6359 | 0.3683 | -1.0042 | | 0.5711 | 1.47 | 1400 | -2.2460 | -2.1507 | -420.5697 | -419.0722 | 0.5967 | 0.7183 | -0.7121 | 0.3997 | -1.1118 | | 0.5655 | 1.57 | 1500 | -2.2212 | -2.1253 | -412.4961 | -410.0143 | 0.5957 | 0.7222 | -0.6314 | 0.3898 | -1.0212 | | 0.5655 | 1.67 | 1600 | -2.1858 | -2.0877 | -414.4090 | -414.7852 | 0.5925 | 0.7242 | -0.6505 | 0.4184 | -1.0689 | | 0.5364 | 1.78 | 1700 | -2.1499 | -2.0500 | -425.4825 | -428.4342 | 0.5873 | 0.7262 | -0.7612 | 0.4442 | -1.2054 | | 0.5702 | 1.88 | 1800 | -2.1546 | -2.0539 | -424.3879 | -429.0814 | 0.5843 | 0.7361 | -0.7503 | 0.4616 | -1.2119 | | 0.5505 | 1.99 | 1900 | -2.1340 | -2.0328 | -413.9261 | -417.8120 | 0.5852 | 0.7321 | -0.6457 | 0.4535 | -1.0992 | | 0.5389 | 2.09 | 2000 | -2.0806 | -1.9769 | -422.3402 | -427.3939 | 0.5828 | 0.7262 | -0.7298 | 0.4652 | -1.1950 | | 0.531 | 2.2 | 2100 | -2.0565 | -1.9511 | -437.7683 | -446.1322 | 0.5805 | 0.7341 | -0.8841 | 0.4983 | -1.3824 | | 0.5162 | 2.3 | 2200 | -2.0180 | -1.9112 | -435.0022 | -443.4644 | 0.5830 | 0.7341 | -0.8564 | 0.4993 | -1.3557 | | 0.5297 | 2.41 | 2300 | -1.9911 | -1.8838 | -448.7519 | -459.4124 | 0.5795 | 0.7183 | -0.9939 | 0.5212 | -1.5152 | | 0.5143 | 2.51 | 2400 | -1.9853 | -1.8784 | -436.2057 | -445.7617 | 0.5806 | 0.7321 | -0.8685 | 0.5102 | -1.3787 | | 0.5377 | 2.62 | 2500 | -1.9648 | -1.8572 | -443.1574 | -454.7680 | 0.5786 | 0.7282 | -0.9380 | 0.5307 | -1.4687 | | 0.4868 | 2.72 | 2600 | -1.9504 | -1.8416 | -439.4379 | -450.5156 | 0.5797 | 0.7302 | -0.9008 | 0.5254 | -1.4262 | | 0.5275 | 2.83 | 2700 | -1.9219 | -1.8117 | -447.6714 | -460.6927 | 0.5754 | 0.7282 | -0.9831 | 0.5448 | -1.5280 | | 0.5042 | 2.93 | 2800 | -1.9484 | -1.8401 | -447.7928 | -460.8577 | 0.5743 | 0.7321 | -0.9843 | 0.5453 | -1.5296 | | 0.4862 | 3.04 | 2900 | -1.9315 | -1.8216 | -452.8863 | -467.0351 | 0.5756 | 0.7202 | -1.0353 | 0.5561 | -1.5914 | | 0.4817 | 3.14 | 3000 | -1.8836 | -1.7716 | -453.8664 | -469.6034 | 0.5786 | 0.7282 | -1.0451 | 0.5720 | -1.6171 | | 0.4767 | 3.24 | 3100 | -1.8663 | -1.7538 | -457.4258 | -472.9984 | 0.5770 | 0.7262 | -1.0807 | 0.5704 | -1.6510 | | 0.4794 | 3.35 | 3200 | -1.8515 | -1.7384 | -460.2550 | -476.8743 | 0.5789 | 0.7262 | -1.1090 | 0.5808 | -1.6898 | | 0.4784 | 3.46 | 3300 | 0.5739 | -1.1929 | -1.7842 | 0.7163 | 0.5913 | -486.3180 | -468.6473 | -1.7313 | -1.8442 | | 0.4797 | 3.56 | 3400 | 0.5754 | -1.1487 | -1.7306 | 0.7202 | 0.5819 | -480.9566 | -464.2336 | -1.7340 | -1.8464 | | 0.4967 | 3.66 | 3500 | 0.5763 | -1.1304 | -1.7077 | 0.7282 | 0.5773 | -478.6690 | -462.4030 | -1.7331 | -1.8458 | | 0.4747 | 3.77 | 3600 | 0.5767 | -1.1301 | -1.7168 | 0.7262 | 0.5867 | -479.5741 | -462.3710 | -1.7268 | -1.8402 | | 0.4895 | 3.87 | 3700 | 0.5747 | -1.1393 | -1.7177 | 0.7202 | 0.5784 | -479.6691 | -463.2915 | -1.7302 | -1.8430 | | 0.5118 | 3.98 | 3800 | 0.5743 | -1.1478 | -1.7342 | 0.7262 | 0.5864 | -481.3118 | -464.1390 | -1.7282 | -1.8417 | | 0.5007 | 4.08 | 3900 | 0.5753 | -1.1349 | -1.7215 | 0.7282 | 0.5866 | -480.0436 | -462.8507 | -1.7269 | -1.8403 | | 0.461 | 4.19 | 4000 | 0.5745 | -1.1675 | -1.7563 | 0.7222 | 0.5888 | -483.5273 | -466.1142 | -1.7189 | -1.8327 | | 0.4881 | 4.29 | 4100 | 0.5762 | -1.1482 | -1.7395 | 0.7282 | 0.5913 | -481.8481 | -464.1829 | -1.7124 | -1.8260 | | 0.4449 | 4.4 | 4200 | 0.5765 | -1.1678 | -1.7615 | 0.7202 | 0.5937 | -484.0506 | -466.1421 | -1.7116 | -1.8251 | | 0.4692 | 4.5 | 4300 | 0.5759 | -1.1710 | -1.7620 | 0.7242 | 0.5910 | -484.0968 | -466.4624 | -1.7143 | -1.8279 | | 0.4654 | 4.61 | 4400 | 0.5760 | -1.1694 | -1.7633 | 0.7262 | 0.5939 | -484.2224 | -466.3009 | -1.7154 | -1.8290 | | 0.4608 | 4.71 | 4500 | 0.5754 | -1.1765 | -1.7692 | 0.7202 | 0.5926 | -484.8123 | -467.0131 | -1.7171 | -1.8304 | | 0.4661 | 4.82 | 4600 | 0.5754 | -1.1819 | -1.7750 | 0.7282 | 0.5931 | -485.3937 | -467.5481 | -1.7120 | -1.8255 | | 0.4859 | 4.92 | 4700 | 0.5756 | -1.1834 | -1.7761 | 0.7202 | 0.5927 | -485.5031 | -467.6952 | -1.7101 | -1.8237 | ### Framework versions - Transformers 4.36.2 - Pytorch 2.1.2+cu118 - Datasets 2.14.6 - Tokenizers 0.15.0