End of training

c8dcf73 verified 7 months ago

No virus

7.57 kB

	---
	license: apache-2.0
	base_model: ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs3
	tags:
	- alignment-handbook
	- generated_from_trainer
	- trl
	- dpo
	- generated_from_trainer
	datasets:
	- HuggingFaceH4/ultrafeedback_binarized
	model-index:
	- name: tinyllama_moe_dpo_ultrachat_v2_epochs3
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# tinyllama_moe_dpo_ultrachat_v2_epochs3

	This model is a fine-tuned version of [ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs3](https://huggingface.co/ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs3) on the HuggingFaceH4/ultrafeedback_binarized dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5855
	- Rewards/chosen: -0.9040
	- Rewards/rejected: -1.3959
	- Rewards/accuracies: 0.7262
	- Rewards/margins: 0.4918
	- Logps/rejected: -442.2930
	- Logps/chosen: -435.4489
	- Logits/rejected: -2.3585
	- Logits/chosen: -2.4345

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 64
	- total_eval_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 96
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6914 \| 0.1 \| 100 \| 0.6913 \| 0.0043 \| -0.0005 \| 0.6349 \| 0.0048 \| -302.7554 \| -344.6115 \| -2.9876 \| -3.0405 \|
	\| 0.6836 \| 0.21 \| 200 \| 0.6830 \| 0.0149 \| -0.0095 \| 0.6448 \| 0.0244 \| -303.6508 \| -343.5497 \| -2.9700 \| -3.0243 \|
	\| 0.6662 \| 0.31 \| 300 \| 0.6712 \| -0.0134 \| -0.0687 \| 0.6746 \| 0.0553 \| -309.5701 \| -346.3836 \| -2.9423 \| -2.9976 \|
	\| 0.6538 \| 0.42 \| 400 \| 0.6571 \| -0.0814 \| -0.1804 \| 0.6766 \| 0.0990 \| -320.7438 \| -353.1802 \| -2.8979 \| -2.9548 \|
	\| 0.6405 \| 0.52 \| 500 \| 0.6448 \| -0.1949 \| -0.3451 \| 0.6726 \| 0.1502 \| -337.2181 \| -364.5344 \| -2.8541 \| -2.9120 \|
	\| 0.6394 \| 0.63 \| 600 \| 0.6372 \| -0.2303 \| -0.4148 \| 0.6825 \| 0.1845 \| -344.1863 \| -368.0754 \| -2.8147 \| -2.8733 \|
	\| 0.6218 \| 0.73 \| 700 \| 0.6313 \| -0.2894 \| -0.5107 \| 0.6825 \| 0.2213 \| -353.7792 \| -373.9845 \| -2.7666 \| -2.8269 \|
	\| 0.6035 \| 0.84 \| 800 \| 0.6249 \| -0.3614 \| -0.6145 \| 0.6845 \| 0.2531 \| -364.1536 \| -381.1849 \| -2.7056 \| -2.7681 \|
	\| 0.6326 \| 0.94 \| 900 \| 0.6204 \| -0.5259 \| -0.8008 \| 0.6845 \| 0.2749 \| -382.7857 \| -397.6345 \| -2.6568 \| -2.7207 \|
	\| 0.6103 \| 1.05 \| 1000 \| 0.6145 \| -0.5164 \| -0.8178 \| 0.6944 \| 0.3014 \| -384.4856 \| -396.6823 \| -2.6322 \| -2.6969 \|
	\| 0.6002 \| 1.15 \| 1100 \| 0.6116 \| -0.5179 \| -0.8325 \| 0.6925 \| 0.3146 \| -385.9578 \| -396.8333 \| -2.6024 \| -2.6688 \|
	\| 0.5729 \| 1.26 \| 1200 \| 0.6083 \| -0.5838 \| -0.9200 \| 0.7044 \| 0.3362 \| -394.7073 \| -403.4271 \| -2.5708 \| -2.6376 \|
	\| 0.599 \| 1.36 \| 1300 \| 0.6077 \| -0.5206 \| -0.8453 \| 0.7103 \| 0.3247 \| -387.2310 \| -397.1021 \| -2.5454 \| -2.6134 \|
	\| 0.5821 \| 1.47 \| 1400 \| 0.6025 \| -0.5941 \| -0.9561 \| 0.7063 \| 0.3620 \| -398.3106 \| -404.4496 \| -2.5211 \| -2.5900 \|
	\| 0.574 \| 1.57 \| 1500 \| 0.5977 \| -0.6617 \| -1.0471 \| 0.7143 \| 0.3854 \| -407.4162 \| -411.2178 \| -2.4887 \| -2.5593 \|
	\| 0.5716 \| 1.67 \| 1600 \| 0.5955 \| -0.6765 \| -1.0870 \| 0.7282 \| 0.4105 \| -411.4020 \| -412.6956 \| -2.4651 \| -2.5369 \|
	\| 0.5477 \| 1.78 \| 1700 \| 0.5904 \| -0.8020 \| -1.2430 \| 0.7321 \| 0.4410 \| -427.0003 \| -425.2423 \| -2.4342 \| -2.5079 \|
	\| 0.5718 \| 1.88 \| 1800 \| 0.5898 \| -0.7932 \| -1.2439 \| 0.7321 \| 0.4507 \| -427.0937 \| -424.3631 \| -2.4186 \| -2.4928 \|
	\| 0.563 \| 1.99 \| 1900 \| 0.5904 \| -0.6874 \| -1.1313 \| 0.7202 \| 0.4439 \| -415.8328 \| -413.7807 \| -2.4223 \| -2.4961 \|
	\| 0.5633 \| 2.09 \| 2000 \| 0.5884 \| -0.7564 \| -1.2105 \| 0.7262 \| 0.4541 \| -423.7504 \| -420.6851 \| -2.4073 \| -2.4819 \|
	\| 0.5564 \| 2.2 \| 2100 \| 0.5878 \| -0.8150 \| -1.2802 \| 0.7262 \| 0.4652 \| -430.7243 \| -426.5488 \| -2.3948 \| -2.4696 \|
	\| 0.5373 \| 2.3 \| 2200 \| 0.5865 \| -0.8791 \| -1.3602 \| 0.7341 \| 0.4812 \| -438.7289 \| -432.9532 \| -2.3795 \| -2.4548 \|
	\| 0.5559 \| 2.41 \| 2300 \| 0.5872 \| -0.8476 \| -1.3260 \| 0.7242 \| 0.4784 \| -435.3001 \| -429.7996 \| -2.3743 \| -2.4496 \|
	\| 0.5467 \| 2.51 \| 2400 \| 0.5868 \| -0.8483 \| -1.3274 \| 0.7222 \| 0.4790 \| -435.4401 \| -429.8786 \| -2.3697 \| -2.4452 \|
	\| 0.5666 \| 2.62 \| 2500 \| 0.5858 \| -0.8754 \| -1.3626 \| 0.7242 \| 0.4872 \| -438.9631 \| -432.5811 \| -2.3641 \| -2.4399 \|
	\| 0.5113 \| 2.72 \| 2600 \| 0.5856 \| -0.8942 \| -1.3842 \| 0.7242 \| 0.4900 \| -441.1211 \| -434.4620 \| -2.3604 \| -2.4361 \|
	\| 0.5601 \| 2.83 \| 2700 \| 0.5855 \| -0.9040 \| -1.3959 \| 0.7262 \| 0.4918 \| -442.2930 \| -435.4489 \| -2.3585 \| -2.4345 \|
	\| 0.5303 \| 2.93 \| 2800 \| 0.5857 \| -0.9003 \| -1.3898 \| 0.7242 \| 0.4894 \| -441.6805 \| -435.0786 \| -2.3581 \| -2.4342 \|


	### Framework versions

	- Transformers 4.36.2
	- Pytorch 2.1.2+cu118
	- Datasets 2.14.6
	- Tokenizers 0.15.0