Edit model card

tinyllama_moe_dpo_ultrachat_v2_epochs3

This model is a fine-tuned version of ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs3 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5855
  • Rewards/chosen: -0.9040
  • Rewards/rejected: -1.3959
  • Rewards/accuracies: 0.7262
  • Rewards/margins: 0.4918
  • Logps/rejected: -442.2930
  • Logps/chosen: -435.4489
  • Logits/rejected: -2.3585
  • Logits/chosen: -2.4345

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 96
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6914 0.1 100 0.6913 0.0043 -0.0005 0.6349 0.0048 -302.7554 -344.6115 -2.9876 -3.0405
0.6836 0.21 200 0.6830 0.0149 -0.0095 0.6448 0.0244 -303.6508 -343.5497 -2.9700 -3.0243
0.6662 0.31 300 0.6712 -0.0134 -0.0687 0.6746 0.0553 -309.5701 -346.3836 -2.9423 -2.9976
0.6538 0.42 400 0.6571 -0.0814 -0.1804 0.6766 0.0990 -320.7438 -353.1802 -2.8979 -2.9548
0.6405 0.52 500 0.6448 -0.1949 -0.3451 0.6726 0.1502 -337.2181 -364.5344 -2.8541 -2.9120
0.6394 0.63 600 0.6372 -0.2303 -0.4148 0.6825 0.1845 -344.1863 -368.0754 -2.8147 -2.8733
0.6218 0.73 700 0.6313 -0.2894 -0.5107 0.6825 0.2213 -353.7792 -373.9845 -2.7666 -2.8269
0.6035 0.84 800 0.6249 -0.3614 -0.6145 0.6845 0.2531 -364.1536 -381.1849 -2.7056 -2.7681
0.6326 0.94 900 0.6204 -0.5259 -0.8008 0.6845 0.2749 -382.7857 -397.6345 -2.6568 -2.7207
0.6103 1.05 1000 0.6145 -0.5164 -0.8178 0.6944 0.3014 -384.4856 -396.6823 -2.6322 -2.6969
0.6002 1.15 1100 0.6116 -0.5179 -0.8325 0.6925 0.3146 -385.9578 -396.8333 -2.6024 -2.6688
0.5729 1.26 1200 0.6083 -0.5838 -0.9200 0.7044 0.3362 -394.7073 -403.4271 -2.5708 -2.6376
0.599 1.36 1300 0.6077 -0.5206 -0.8453 0.7103 0.3247 -387.2310 -397.1021 -2.5454 -2.6134
0.5821 1.47 1400 0.6025 -0.5941 -0.9561 0.7063 0.3620 -398.3106 -404.4496 -2.5211 -2.5900
0.574 1.57 1500 0.5977 -0.6617 -1.0471 0.7143 0.3854 -407.4162 -411.2178 -2.4887 -2.5593
0.5716 1.67 1600 0.5955 -0.6765 -1.0870 0.7282 0.4105 -411.4020 -412.6956 -2.4651 -2.5369
0.5477 1.78 1700 0.5904 -0.8020 -1.2430 0.7321 0.4410 -427.0003 -425.2423 -2.4342 -2.5079
0.5718 1.88 1800 0.5898 -0.7932 -1.2439 0.7321 0.4507 -427.0937 -424.3631 -2.4186 -2.4928
0.563 1.99 1900 0.5904 -0.6874 -1.1313 0.7202 0.4439 -415.8328 -413.7807 -2.4223 -2.4961
0.5633 2.09 2000 0.5884 -0.7564 -1.2105 0.7262 0.4541 -423.7504 -420.6851 -2.4073 -2.4819
0.5564 2.2 2100 0.5878 -0.8150 -1.2802 0.7262 0.4652 -430.7243 -426.5488 -2.3948 -2.4696
0.5373 2.3 2200 0.5865 -0.8791 -1.3602 0.7341 0.4812 -438.7289 -432.9532 -2.3795 -2.4548
0.5559 2.41 2300 0.5872 -0.8476 -1.3260 0.7242 0.4784 -435.3001 -429.7996 -2.3743 -2.4496
0.5467 2.51 2400 0.5868 -0.8483 -1.3274 0.7222 0.4790 -435.4401 -429.8786 -2.3697 -2.4452
0.5666 2.62 2500 0.5858 -0.8754 -1.3626 0.7242 0.4872 -438.9631 -432.5811 -2.3641 -2.4399
0.5113 2.72 2600 0.5856 -0.8942 -1.3842 0.7242 0.4900 -441.1211 -434.4620 -2.3604 -2.4361
0.5601 2.83 2700 0.5855 -0.9040 -1.3959 0.7262 0.4918 -442.2930 -435.4489 -2.3585 -2.4345
0.5303 2.93 2800 0.5857 -0.9003 -1.3898 0.7242 0.4894 -441.6805 -435.0786 -2.3581 -2.4342

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu118
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
6.43B params
Tensor type
BF16
·

Finetuned from

Dataset used to train ondevicellm/tinyllama_moe_dpo_ultrachat_v2_epochs3