MPT_1000_STEPS_1e5_rate_01_beta_DPO

This model is a fine-tuned version of mosaicml/mpt-7b-instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8946
Rewards/chosen: -4.4962
Rewards/rejected: -4.4462
Rewards/accuracies: 0.4901
Rewards/margins: -0.0501
Logps/rejected: -66.0193
Logps/chosen: -65.7547
Logits/rejected: 8.4623
Logits/chosen: 8.4615

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7056	0.05	50	0.9054	-1.8795	-1.8769	0.4857	-0.0027	-40.3261	-39.5876	13.2447	13.2474
1.3284	0.1	100	1.3365	-5.2198	-5.1996	0.4835	-0.0202	-73.5531	-72.9898	40.0297	40.0297
4.0395	0.15	150	1.2940	-5.6920	-5.6131	0.4637	-0.0789	-77.6884	-77.7120	34.5576	34.5577
1.1998	0.2	200	1.1437	-4.4153	-4.3103	0.4747	-0.1050	-64.6601	-64.9452	14.5309	14.5309
1.0001	0.24	250	1.3580	-5.0983	-5.0232	0.5033	-0.0751	-71.7890	-71.7751	24.0739	24.0735
1.1726	0.29	300	1.0394	-4.1980	-4.0831	0.4879	-0.1149	-62.3888	-62.7721	16.4743	16.4742
1.0955	0.34	350	1.0584	-4.9210	-4.7783	0.4747	-0.1427	-69.3404	-70.0020	20.7178	20.7172
1.2598	0.39	400	1.0408	-3.8776	-3.8210	0.4945	-0.0566	-59.7678	-59.5681	17.0600	17.0587
1.2403	0.44	450	0.9855	-4.8112	-4.6991	0.4747	-0.1121	-68.5488	-68.9046	10.9237	10.9226
1.2967	0.49	500	0.9814	-4.7410	-4.6563	0.4769	-0.0846	-68.1207	-68.2017	15.1832	15.1825
1.152	0.54	550	0.9258	-4.6800	-4.6273	0.4989	-0.0527	-67.8303	-67.5925	9.7415	9.7409
0.9473	0.59	600	0.9416	-3.6301	-3.6600	0.5341	0.0299	-58.1573	-57.0931	10.5794	10.5787
0.9534	0.64	650	0.9361	-4.7539	-4.6806	0.4681	-0.0733	-68.3630	-68.3308	11.2450	11.2442
0.985	0.68	700	0.9194	-4.5437	-4.5232	0.5011	-0.0205	-66.7896	-66.2292	9.1942	9.1934
0.97	0.73	750	0.9090	-4.6508	-4.5989	0.4835	-0.0520	-67.5462	-67.3006	8.0813	8.0806
0.8148	0.78	800	0.8992	-4.5695	-4.5180	0.4923	-0.0515	-66.7373	-66.4875	8.3458	8.3450
0.9668	0.83	850	0.8976	-4.5172	-4.4650	0.4901	-0.0521	-66.2078	-65.9638	8.2885	8.2877
0.9438	0.88	900	0.8952	-4.4950	-4.4441	0.4923	-0.0509	-65.9988	-65.7424	8.4833	8.4825
1.0069	0.93	950	0.8954	-4.4971	-4.4461	0.4901	-0.0510	-66.0188	-65.7634	8.4615	8.4607
0.7377	0.98	1000	0.8946	-4.4962	-4.4462	0.4901	-0.0501	-66.0193	-65.7547	8.4623	8.4615

Framework versions

Transformers 4.39.1
Pytorch 2.0.0+cu117
Datasets 2.18.0
Tokenizers 0.15.2

tsavage68
/

MPT_1000_STEPS_1e5_rate_01_beta_DPO

MPT_1000_STEPS_1e5_rate_01_beta_DPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/MPT_1000_STEPS_1e5_rate_01_beta_DPO

Evaluation results