phi_1_5_dpo_ep6

This model is a fine-tuned version of /home/work/saic-llm-2023/checkpoints/microsoft/phi-1_5 on the argilla/ultrafeedback-binarized-preferences-cleaned dataset. It achieves the following results on the evaluation set:

Loss: 0.4748
Rewards/chosen: -0.9135
Rewards/rejected: -1.9448
Rewards/accuracies: 0.7937
Rewards/margins: 1.0313
Logps/rejected: -618.5530
Logps/chosen: -634.6866
Logits/rejected: 3.4318
Logits/chosen: 3.4052

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 6

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6881	0.11	100	0.6856	0.0468	0.0298	0.7024	0.0170	-421.0949	-538.6564	4.8883	4.6646
0.6692	0.22	200	0.6642	0.1742	0.0988	0.7123	0.0754	-414.1955	-525.9189	4.8718	4.6370
0.6368	0.33	300	0.6442	0.2557	0.1261	0.7083	0.1296	-411.4657	-517.7680	4.8407	4.5968
0.6283	0.43	400	0.6283	0.2608	0.0812	0.7083	0.1795	-415.9522	-517.2609	4.7629	4.5156
0.6052	0.54	500	0.6132	0.1429	-0.0998	0.7103	0.2427	-434.0545	-529.0491	4.5516	4.3153
0.5923	0.65	600	0.6008	0.1425	-0.1628	0.7123	0.3053	-440.3539	-529.0887	4.4588	4.2289
0.5899	0.76	700	0.5880	0.0755	-0.2915	0.7083	0.3670	-453.2271	-535.7857	4.3444	4.1349
0.558	0.87	800	0.5715	-0.0965	-0.5304	0.7262	0.4339	-477.1144	-552.9822	4.2704	4.0642
0.5495	0.98	900	0.5552	-0.2658	-0.7677	0.7341	0.5019	-500.8484	-569.9210	4.1976	4.0015
0.5124	1.09	1000	0.5473	-0.3871	-0.9394	0.7321	0.5523	-518.0129	-582.0427	4.0959	3.9125
0.5322	1.19	1100	0.5400	-0.3641	-0.9463	0.7579	0.5821	-518.7011	-579.7518	4.0436	3.8715
0.5281	1.3	1200	0.5344	-0.5340	-1.1498	0.7460	0.6158	-539.0579	-596.7365	3.9368	3.7842
0.5063	1.41	1300	0.5297	-0.3754	-0.9975	0.7579	0.6221	-523.8221	-580.8731	4.0135	3.8499
0.5073	1.52	1400	0.5216	-0.3819	-1.0300	0.7758	0.6481	-527.0738	-581.5236	3.9401	3.7846
0.5156	1.63	1500	0.5177	-0.5748	-1.2824	0.7560	0.7077	-552.3166	-600.8123	3.7868	3.6678
0.5072	1.74	1600	0.5138	-0.4973	-1.2122	0.7798	0.7149	-545.2914	-593.0637	3.7791	3.6614
0.4908	1.85	1700	0.5077	-0.5479	-1.2972	0.7798	0.7493	-553.7918	-598.1292	3.7893	3.6696
0.5109	1.95	1800	0.5068	-0.6157	-1.3930	0.7758	0.7773	-563.3733	-604.9089	3.7679	3.6556
0.4779	2.06	1900	0.5005	-0.6247	-1.4169	0.7738	0.7922	-565.7673	-605.8088	3.7118	3.6062
0.4833	2.17	2000	0.4992	-0.6841	-1.5026	0.7698	0.8185	-574.3334	-611.7432	3.6739	3.5849
0.4879	2.28	2100	0.4967	-0.8128	-1.6654	0.7698	0.8526	-590.6146	-624.6127	3.5692	3.5030
0.4645	2.39	2200	0.4927	-0.6969	-1.5365	0.7857	0.8396	-577.7230	-613.0289	3.6647	3.5772
0.4587	2.5	2300	0.4936	-0.6024	-1.4533	0.7778	0.8509	-569.4068	-603.5743	3.6615	3.5790
0.437	2.61	2400	0.4921	-0.8826	-1.7724	0.7738	0.8897	-601.3099	-631.5984	3.4903	3.4343
0.4204	2.71	2500	0.4890	-0.8338	-1.7338	0.7758	0.8999	-597.4498	-626.7175	3.5447	3.4804
0.467	2.82	2600	0.4865	-0.5910	-1.4516	0.7877	0.8606	-569.2333	-602.4326	3.5690	3.5000
0.458	2.93	2700	0.4861	-0.7666	-1.6726	0.7837	0.9059	-591.3298	-620.0014	3.5208	3.4579
0.462	3.04	2800	0.4844	-0.7109	-1.6145	0.7917	0.9037	-585.5269	-614.4227	3.5553	3.4954
0.4258	3.15	2900	0.4888	-0.9814	-1.9414	0.7817	0.9600	-618.2142	-641.4772	3.4761	3.4227
0.4219	3.26	3000	0.4856	-0.8858	-1.8323	0.7937	0.9465	-607.3071	-631.9181	3.4895	3.4362
0.4295	3.37	3100	0.4823	-0.8140	-1.7651	0.7976	0.9511	-600.5797	-624.7327	3.4880	3.4357
0.4268	3.47	3200	0.4800	-0.8592	-1.8282	0.7976	0.9690	-606.8929	-629.2567	3.4536	3.4126
0.4338	3.58	3300	0.4785	-0.8784	-1.8458	0.7956	0.9674	-608.6551	-631.1731	3.4471	3.4096
0.4297	3.69	3400	0.4774	-0.9026	-1.8929	0.7956	0.9903	-613.3634	-633.5962	3.4710	3.4326
0.4133	3.8	3500	0.4785	-0.9173	-1.9072	0.7937	0.9899	-614.7964	-635.0674	3.4610	3.4232
0.4275	3.91	3600	0.4794	-1.0209	-2.0380	0.7837	1.0171	-627.8748	-645.4227	3.4635	3.4227
0.4224	4.02	3700	0.4784	-0.9130	-1.9086	0.7937	0.9955	-614.9320	-634.6396	3.4812	3.4400
0.4101	4.13	3800	0.4773	-0.9474	-1.9571	0.7877	1.0097	-619.7819	-638.0772	3.4569	3.4225
0.4295	4.23	3900	0.4790	-0.9893	-2.0096	0.7956	1.0203	-625.0361	-642.2666	3.4290	3.3998
0.4162	4.34	4000	0.4769	-0.9682	-1.9897	0.7956	1.0215	-623.0465	-640.1562	3.4342	3.4040
0.425	4.45	4100	0.4759	-0.9553	-1.9788	0.7917	1.0236	-621.9555	-638.8621	3.4580	3.4237
0.4155	4.56	4200	0.4778	-1.0183	-2.0573	0.7917	1.0390	-629.8077	-645.1696	3.4277	3.3981
0.4311	4.67	4300	0.4765	-0.9712	-2.0065	0.7897	1.0353	-624.7266	-640.4598	3.4413	3.4107
0.41	4.78	4400	0.4768	-0.9764	-2.0101	0.7917	1.0337	-625.0818	-640.9733	3.4387	3.4081
0.4127	4.89	4500	0.4749	-0.9599	-1.9994	0.7937	1.0395	-624.0168	-639.3277	3.4453	3.4160
0.453	4.99	4600	0.4748	-0.9231	-1.9528	0.7917	1.0297	-619.3519	-635.6462	3.4444	3.4142
0.4035	5.1	4700	0.4754	-0.9561	-1.9965	0.7897	1.0403	-623.7211	-638.9504	3.4293	3.4019
0.4225	5.21	4800	0.4753	-0.9471	-1.9855	0.7877	1.0384	-622.6226	-638.0461	3.4359	3.4077
0.3941	5.32	4900	0.4754	-0.9579	-1.9978	0.7897	1.0400	-623.8593	-639.1230	3.4282	3.4012
0.4093	5.43	5000	0.4748	-0.9135	-1.9448	0.7937	1.0313	-618.5530	-634.6866	3.4318	3.4052
0.3902	5.54	5100	0.4754	-0.9457	-1.9815	0.7956	1.0358	-622.2274	-637.9056	3.4281	3.4014
0.3795	5.65	5200	0.4753	-0.9484	-1.9852	0.7897	1.0368	-622.5895	-638.1724	3.4253	3.3988
0.3915	5.75	5300	0.4754	-0.9571	-1.9957	0.7956	1.0386	-623.6450	-639.0427	3.4242	3.3979
0.4075	5.86	5400	0.4756	-0.9566	-1.9949	0.7877	1.0383	-623.5674	-638.9974	3.4221	3.3962
0.4293	5.97	5500	0.4756	-0.9571	-1.9948	0.7897	1.0377	-623.5548	-639.0446	3.4230	3.3964

Framework versions

Transformers 4.38.0
Pytorch 2.1.2+cu118
Datasets 2.17.1
Tokenizers 0.15.0

ondevicellm
/

phi_1_5_dpo_ep6

phi_1_5_dpo_ep6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ondevicellm/phi_1_5_dpo_ep6

Dataset used to train ondevicellm/phi_1_5_dpo_ep6

Evaluation results