tinyllama_moe_dpo_ultrachat_v2_epochs5

This model is a fine-tuned version of ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs5 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.5739
Rewards/chosen: -1.1929
Rewards/rejected: -1.7842
Rewards/accuracies: 0.7163
Rewards/margins: 0.5913
Logps/rejected: -486.3180
Logps/chosen: -468.6473
Logits/rejected: -1.7313
Logits/chosen: -1.8442

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 96
num_epochs: 5

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
0.6913	0.1	100	-2.7889	-2.7179	-348.8463	-307.7887	0.6915	0.6012	0.0051	0.0041	0.0011
0.6848	0.21	200	-2.7786	-2.7064	-347.1148	-307.7814	0.6844	0.6548	0.0224	0.0213	0.0011
0.6719	0.31	300	-2.7564	-2.6828	-347.1926	-310.3274	0.6745	0.6567	0.0217	0.0460	-0.0243
0.6593	0.42	400	-2.7168	-2.6417	-351.2079	-317.7508	0.6626	0.6627	-0.0185	0.0801	-0.0985
0.6489	0.52	500	-2.6766	-2.5996	-359.7169	-330.5644	0.6503	0.6667	-0.1036	0.1231	-0.2267
0.6442	0.63	600	-2.6209	-2.5415	-364.4345	-339.3099	0.6407	0.6806	-0.1507	0.1634	-0.3141
0.6271	0.73	700	-2.5658	-2.4836	-373.3324	-352.5069	0.6321	0.6766	-0.2397	0.2064	-0.4461
0.607	0.84	800	-2.5051	-2.4199	-379.1497	-361.6935	0.6261	0.6845	-0.2979	0.2401	-0.5380
0.6322	0.94	900	-2.4508	-2.3644	-397.4641	-382.2142	0.6199	0.6905	-0.4810	0.2621	-0.7432
0.605	1.05	1000	-2.3964	-2.3068	-404.5890	-394.0288	0.6115	0.6885	-0.5523	0.3090	-0.8613
0.601	1.15	1100	-2.3602	-2.2683	-418.7677	-411.0065	0.6068	0.6964	-0.6941	0.3370	-1.0311
0.5676	1.26	1200	-2.3216	-2.2290	-417.0859	-411.9764	0.6020	0.7123	-0.6773	0.3635	-1.0408
0.5909	1.36	1300	-2.2912	-2.1982	-412.9470	-408.3128	0.5999	0.7123	-0.6359	0.3683	-1.0042
0.5711	1.47	1400	-2.2460	-2.1507	-420.5697	-419.0722	0.5967	0.7183	-0.7121	0.3997	-1.1118
0.5655	1.57	1500	-2.2212	-2.1253	-412.4961	-410.0143	0.5957	0.7222	-0.6314	0.3898	-1.0212
0.5655	1.67	1600	-2.1858	-2.0877	-414.4090	-414.7852	0.5925	0.7242	-0.6505	0.4184	-1.0689
0.5364	1.78	1700	-2.1499	-2.0500	-425.4825	-428.4342	0.5873	0.7262	-0.7612	0.4442	-1.2054
0.5702	1.88	1800	-2.1546	-2.0539	-424.3879	-429.0814	0.5843	0.7361	-0.7503	0.4616	-1.2119
0.5505	1.99	1900	-2.1340	-2.0328	-413.9261	-417.8120	0.5852	0.7321	-0.6457	0.4535	-1.0992
0.5389	2.09	2000	-2.0806	-1.9769	-422.3402	-427.3939	0.5828	0.7262	-0.7298	0.4652	-1.1950
0.531	2.2	2100	-2.0565	-1.9511	-437.7683	-446.1322	0.5805	0.7341	-0.8841	0.4983	-1.3824
0.5162	2.3	2200	-2.0180	-1.9112	-435.0022	-443.4644	0.5830	0.7341	-0.8564	0.4993	-1.3557
0.5297	2.41	2300	-1.9911	-1.8838	-448.7519	-459.4124	0.5795	0.7183	-0.9939	0.5212	-1.5152
0.5143	2.51	2400	-1.9853	-1.8784	-436.2057	-445.7617	0.5806	0.7321	-0.8685	0.5102	-1.3787
0.5377	2.62	2500	-1.9648	-1.8572	-443.1574	-454.7680	0.5786	0.7282	-0.9380	0.5307	-1.4687
0.4868	2.72	2600	-1.9504	-1.8416	-439.4379	-450.5156	0.5797	0.7302	-0.9008	0.5254	-1.4262
0.5275	2.83	2700	-1.9219	-1.8117	-447.6714	-460.6927	0.5754	0.7282	-0.9831	0.5448	-1.5280
0.5042	2.93	2800	-1.9484	-1.8401	-447.7928	-460.8577	0.5743	0.7321	-0.9843	0.5453	-1.5296
0.4862	3.04	2900	-1.9315	-1.8216	-452.8863	-467.0351	0.5756	0.7202	-1.0353	0.5561	-1.5914
0.4817	3.14	3000	-1.8836	-1.7716	-453.8664	-469.6034	0.5786	0.7282	-1.0451	0.5720	-1.6171
0.4767	3.24	3100	-1.8663	-1.7538	-457.4258	-472.9984	0.5770	0.7262	-1.0807	0.5704	-1.6510
0.4794	3.35	3200	-1.8515	-1.7384	-460.2550	-476.8743	0.5789	0.7262	-1.1090	0.5808	-1.6898
0.4784	3.46	3300	0.5739	-1.1929	-1.7842	0.7163	0.5913	-486.3180	-468.6473	-1.7313	-1.8442
0.4797	3.56	3400	0.5754	-1.1487	-1.7306	0.7202	0.5819	-480.9566	-464.2336	-1.7340	-1.8464
0.4967	3.66	3500	0.5763	-1.1304	-1.7077	0.7282	0.5773	-478.6690	-462.4030	-1.7331	-1.8458
0.4747	3.77	3600	0.5767	-1.1301	-1.7168	0.7262	0.5867	-479.5741	-462.3710	-1.7268	-1.8402
0.4895	3.87	3700	0.5747	-1.1393	-1.7177	0.7202	0.5784	-479.6691	-463.2915	-1.7302	-1.8430
0.5118	3.98	3800	0.5743	-1.1478	-1.7342	0.7262	0.5864	-481.3118	-464.1390	-1.7282	-1.8417
0.5007	4.08	3900	0.5753	-1.1349	-1.7215	0.7282	0.5866	-480.0436	-462.8507	-1.7269	-1.8403
0.461	4.19	4000	0.5745	-1.1675	-1.7563	0.7222	0.5888	-483.5273	-466.1142	-1.7189	-1.8327
0.4881	4.29	4100	0.5762	-1.1482	-1.7395	0.7282	0.5913	-481.8481	-464.1829	-1.7124	-1.8260
0.4449	4.4	4200	0.5765	-1.1678	-1.7615	0.7202	0.5937	-484.0506	-466.1421	-1.7116	-1.8251
0.4692	4.5	4300	0.5759	-1.1710	-1.7620	0.7242	0.5910	-484.0968	-466.4624	-1.7143	-1.8279
0.4654	4.61	4400	0.5760	-1.1694	-1.7633	0.7262	0.5939	-484.2224	-466.3009	-1.7154	-1.8290
0.4608	4.71	4500	0.5754	-1.1765	-1.7692	0.7202	0.5926	-484.8123	-467.0131	-1.7171	-1.8304
0.4661	4.82	4600	0.5754	-1.1819	-1.7750	0.7282	0.5931	-485.3937	-467.5481	-1.7120	-1.8255
0.4859	4.92	4700	0.5756	-1.1834	-1.7761	0.7202	0.5927	-485.5031	-467.6952	-1.7101	-1.8237

Framework versions

Transformers 4.36.2
Pytorch 2.1.2+cu118
Datasets 2.14.6
Tokenizers 0.15.0

ondevicellm
/

tinyllama_moe_dpo_ultrachat_v2_epochs5

tinyllama_moe_dpo_ultrachat_v2_epochs5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ondevicellm/tinyllama_moe_dpo_ultrachat_v2_epochs5

Dataset used to train ondevicellm/tinyllama_moe_dpo_ultrachat_v2_epochs5

Evaluation results