PE-13b-lora

This model is a fine-tuned version of stabilityai/StableBeluga-13B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.693	0.07	100	0.6933	-0.0008	-0.0005	0.4889	-0.0003	-72.1053	-91.9932	-1.7861	-2.0525
0.69	0.14	200	0.6901	0.0031	-0.0015	0.5611	0.0046	-72.1153	-91.9544	-1.7859	-2.0524
0.6842	0.21	300	0.6832	0.0139	-0.0056	0.6917	0.0195	-72.1567	-91.8467	-1.7847	-2.0513
0.672	0.27	400	0.6718	0.0281	-0.0131	0.8250	0.0412	-72.2312	-91.7049	-1.7836	-2.0504
0.6563	0.34	500	0.6575	0.0498	-0.0211	0.8861	0.0709	-72.3116	-91.4876	-1.7821	-2.0494
0.6437	0.41	600	0.6416	0.0705	-0.0340	0.9111	0.1044	-72.4401	-91.2810	-1.7807	-2.0486
0.6261	0.48	700	0.6277	0.0885	-0.0435	0.9250	0.1320	-72.5355	-91.1010	-1.7796	-2.0478
0.6117	0.55	800	0.6127	0.1097	-0.0567	0.9222	0.1664	-72.6675	-90.8891	-1.7786	-2.0474
0.6002	0.62	900	0.6019	0.1226	-0.0683	0.9278	0.1909	-72.7836	-90.7598	-1.7777	-2.0468
0.5912	0.68	1000	0.5912	0.1344	-0.0805	0.9333	0.2148	-72.9053	-90.6422	-1.7770	-2.0466
0.5822	0.75	1100	0.5822	0.1441	-0.0909	0.9472	0.2350	-73.0092	-90.5447	-1.7763	-2.0462
0.5789	0.82	1200	0.5759	0.1517	-0.0992	0.9333	0.2509	-73.0923	-90.4690	-1.7763	-2.0465
0.5689	0.89	1300	0.5722	0.1555	-0.1033	0.9500	0.2588	-73.1332	-90.4305	-1.7762	-2.0465
0.5694	0.96	1400	0.5702	0.1579	-0.1066	0.9417	0.2644	-73.1662	-90.4070	-1.7761	-2.0465