tinyllama_moe_dpo_ultrachat_v2_epochs3

This model is a fine-tuned version of ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs3 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.5855
Rewards/chosen: -0.9040
Rewards/rejected: -1.3959
Rewards/accuracies: 0.7262
Rewards/margins: 0.4918
Logps/rejected: -442.2930
Logps/chosen: -435.4489
Logits/rejected: -2.3585
Logits/chosen: -2.4345

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 96
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6914	0.1	100	0.6913	0.0043	-0.0005	0.6349	0.0048	-302.7554	-344.6115	-2.9876	-3.0405
0.6836	0.21	200	0.6830	0.0149	-0.0095	0.6448	0.0244	-303.6508	-343.5497	-2.9700	-3.0243
0.6662	0.31	300	0.6712	-0.0134	-0.0687	0.6746	0.0553	-309.5701	-346.3836	-2.9423	-2.9976
0.6538	0.42	400	0.6571	-0.0814	-0.1804	0.6766	0.0990	-320.7438	-353.1802	-2.8979	-2.9548
0.6405	0.52	500	0.6448	-0.1949	-0.3451	0.6726	0.1502	-337.2181	-364.5344	-2.8541	-2.9120
0.6394	0.63	600	0.6372	-0.2303	-0.4148	0.6825	0.1845	-344.1863	-368.0754	-2.8147	-2.8733
0.6218	0.73	700	0.6313	-0.2894	-0.5107	0.6825	0.2213	-353.7792	-373.9845	-2.7666	-2.8269
0.6035	0.84	800	0.6249	-0.3614	-0.6145	0.6845	0.2531	-364.1536	-381.1849	-2.7056	-2.7681
0.6326	0.94	900	0.6204	-0.5259	-0.8008	0.6845	0.2749	-382.7857	-397.6345	-2.6568	-2.7207
0.6103	1.05	1000	0.6145	-0.5164	-0.8178	0.6944	0.3014	-384.4856	-396.6823	-2.6322	-2.6969
0.6002	1.15	1100	0.6116	-0.5179	-0.8325	0.6925	0.3146	-385.9578	-396.8333	-2.6024	-2.6688
0.5729	1.26	1200	0.6083	-0.5838	-0.9200	0.7044	0.3362	-394.7073	-403.4271	-2.5708	-2.6376
0.599	1.36	1300	0.6077	-0.5206	-0.8453	0.7103	0.3247	-387.2310	-397.1021	-2.5454	-2.6134
0.5821	1.47	1400	0.6025	-0.5941	-0.9561	0.7063	0.3620	-398.3106	-404.4496	-2.5211	-2.5900
0.574	1.57	1500	0.5977	-0.6617	-1.0471	0.7143	0.3854	-407.4162	-411.2178	-2.4887	-2.5593
0.5716	1.67	1600	0.5955	-0.6765	-1.0870	0.7282	0.4105	-411.4020	-412.6956	-2.4651	-2.5369
0.5477	1.78	1700	0.5904	-0.8020	-1.2430	0.7321	0.4410	-427.0003	-425.2423	-2.4342	-2.5079
0.5718	1.88	1800	0.5898	-0.7932	-1.2439	0.7321	0.4507	-427.0937	-424.3631	-2.4186	-2.4928
0.563	1.99	1900	0.5904	-0.6874	-1.1313	0.7202	0.4439	-415.8328	-413.7807	-2.4223	-2.4961
0.5633	2.09	2000	0.5884	-0.7564	-1.2105	0.7262	0.4541	-423.7504	-420.6851	-2.4073	-2.4819
0.5564	2.2	2100	0.5878	-0.8150	-1.2802	0.7262	0.4652	-430.7243	-426.5488	-2.3948	-2.4696
0.5373	2.3	2200	0.5865	-0.8791	-1.3602	0.7341	0.4812	-438.7289	-432.9532	-2.3795	-2.4548
0.5559	2.41	2300	0.5872	-0.8476	-1.3260	0.7242	0.4784	-435.3001	-429.7996	-2.3743	-2.4496
0.5467	2.51	2400	0.5868	-0.8483	-1.3274	0.7222	0.4790	-435.4401	-429.8786	-2.3697	-2.4452
0.5666	2.62	2500	0.5858	-0.8754	-1.3626	0.7242	0.4872	-438.9631	-432.5811	-2.3641	-2.4399
0.5113	2.72	2600	0.5856	-0.8942	-1.3842	0.7242	0.4900	-441.1211	-434.4620	-2.3604	-2.4361
0.5601	2.83	2700	0.5855	-0.9040	-1.3959	0.7262	0.4918	-442.2930	-435.4489	-2.3585	-2.4345
0.5303	2.93	2800	0.5857	-0.9003	-1.3898	0.7242	0.4894	-441.6805	-435.0786	-2.3581	-2.4342

Framework versions

Transformers 4.36.2
Pytorch 2.1.2+cu118
Datasets 2.14.6
Tokenizers 0.15.0

ondevicellm
/

tinyllama_moe_dpo_ultrachat_v2_epochs3

tinyllama_moe_dpo_ultrachat_v2_epochs3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ondevicellm/tinyllama_moe_dpo_ultrachat_v2_epochs3

Dataset used to train ondevicellm/tinyllama_moe_dpo_ultrachat_v2_epochs3

Evaluation results