WeniGPT-QA-Zephyr-7B-4.0.1-KTO

This model is a fine-tuned version of HuggingFaceH4/zephyr-7b-beta on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/margins	Logps/chosen	Logps/rejected
0.204	0.38	50	0.0275	5.6328	-20.3465	25.9793	-120.3375	-405.6329
0.073	0.76	100	0.0143	5.7898	-19.2664	25.0562	-118.7677	-394.8320
0.0553	1.13	150	0.0225	5.8121	-29.9815	35.7935	-118.5453	-501.9826
0.0232	1.51	200	0.0048	6.4515	-27.5911	34.0425	-112.1512	-478.0785
0.0519	1.89	250	0.0081	6.4814	-30.4910	36.9724	-111.8522	-507.0782
0.0095	2.27	300	0.0154	6.4081	-33.8838	40.2919	-112.5852	-541.0063
0.0098	2.65	350	0.0052	6.5962	-41.2733	47.8696	-110.7035	-614.9014
0.0038	3.02	400	0.0038	6.5225	-33.7048	40.2273	-111.4407	-539.2156
0.0068	3.4	450	0.0080	6.3449	-43.3527	49.6976	-113.2169	-635.6954
0.0037	3.78	500	0.0071	6.5639	-44.5033	51.0672	-111.0268	-647.2004
0.0032	4.16	550	0.0085	6.6333	-29.5095	36.1428	-110.3333	-497.2631
0.0029	4.54	600	0.0048	6.5574	-42.0858	48.6432	-111.0921	-623.0258
0.0028	4.91	650	0.0041	6.6663	-41.3645	48.0309	-110.0026	-615.8130
0.0032	5.29	700	0.0040	6.6773	-41.2318	47.9091	-109.8931	-614.4858
0.003	5.67	750	0.0040	6.6870	-41.2272	47.9142	-109.7961	-614.4399