eurus-dpo-qlora-uf-5e-6

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.5164
Rewards/chosen: -0.9790
Rewards/rejected: -1.9788
Rewards/accuracies: 0.7381
Rewards/margins: 0.9998
Rewards/margins Max: 3.4601
Rewards/margins Min: -0.9016
Rewards/margins Std: 1.4965
Logps/rejected: -460.7238
Logps/chosen: -373.6762
Logits/rejected: -1.9530
Logits/chosen: -2.0457

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 16
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6864	0.03	100	0.6881	-0.0135	-0.0276	0.6389	0.0140	0.0963	-0.0519	0.0479	-265.6017	-277.1340	-2.2289	-2.3384
0.6727	0.05	200	0.6679	-0.1594	-0.2453	0.6548	0.0860	0.4969	-0.2700	0.2509	-287.3769	-291.7154	-2.2025	-2.3104
0.6521	0.08	300	0.6335	-0.2848	-0.4863	0.6845	0.2015	0.8574	-0.3927	0.4174	-311.4767	-304.2637	-2.1870	-2.2942
0.6166	0.1	400	0.6224	-1.0777	-1.6294	0.6706	0.5517	2.5154	-1.0756	1.1911	-425.7865	-383.5505	-2.0704	-2.1724
0.6046	0.13	500	0.5995	-0.5398	-0.9206	0.7024	0.3807	1.5570	-0.5438	0.6976	-354.8985	-329.7637	-2.0362	-2.1377
0.5729	0.16	600	0.5876	-1.0546	-1.7496	0.6944	0.6951	2.8409	-0.8941	1.2371	-437.8077	-381.2366	-1.9100	-2.0107
0.6337	0.18	700	0.5726	-1.0427	-1.6902	0.7063	0.6475	2.6120	-0.7762	1.1332	-431.8674	-380.0523	-1.7956	-1.8927
0.59	0.21	800	0.5679	-0.6047	-1.0831	0.7321	0.4784	1.7665	-0.5214	0.7684	-371.1527	-336.2452	-1.9223	-2.0207
0.5405	0.24	900	0.5600	-1.1375	-1.9414	0.7222	0.8039	3.0800	-0.8496	1.3199	-456.9872	-389.5308	-2.0248	-2.1234
0.6278	0.26	1000	0.5523	-1.0923	-1.9590	0.7044	0.8667	3.3940	-0.8638	1.4208	-458.7448	-385.0119	-1.9196	-2.0220
0.5655	0.29	1100	0.5478	-0.8868	-1.7208	0.7421	0.8340	3.2954	-0.7560	1.3494	-434.9226	-364.4635	-1.9093	-2.0104
0.5344	0.31	1200	0.5446	-0.7887	-1.4986	0.7341	0.7099	2.6064	-0.6513	1.0880	-412.6989	-354.6506	-1.9237	-2.0213
0.5576	0.34	1300	0.5354	-0.9605	-1.7839	0.7460	0.8234	3.0657	-0.7919	1.2796	-441.2323	-371.8330	-1.7950	-1.8904
0.5335	0.37	1400	0.5371	-1.0326	-1.8497	0.7361	0.8171	2.9854	-0.8145	1.2547	-447.8088	-379.0401	-1.8824	-1.9808
0.5347	0.39	1500	0.5351	-0.9420	-1.7947	0.7520	0.8527	3.1090	-0.8553	1.3042	-442.3140	-369.9821	-1.8311	-1.9294
0.5538	0.42	1600	0.5312	-1.1441	-2.1579	0.7440	1.0138	3.7623	-0.9478	1.5661	-478.6291	-390.1890	-1.8438	-1.9418
0.5175	0.44	1700	0.5350	-1.0343	-1.9335	0.7321	0.8992	3.2678	-0.9029	1.3854	-456.1965	-379.2123	-1.8820	-1.9785
0.5417	0.47	1800	0.5316	-0.8672	-1.8277	0.7560	0.9605	3.5835	-0.8613	1.4946	-445.6108	-362.5007	-1.8278	-1.9306
0.4904	0.5	1900	0.5328	-1.0787	-2.0772	0.7421	0.9985	3.6452	-0.9893	1.5556	-470.5620	-383.6512	-1.8132	-1.9118
0.5071	0.52	2000	0.5326	-1.0668	-2.0335	0.7361	0.9667	3.5683	-1.0151	1.5323	-466.1959	-382.4640	-1.8844	-1.9823
0.5261	0.55	2100	0.5325	-1.1071	-2.0779	0.7282	0.9708	3.6057	-1.0075	1.5567	-470.6340	-386.4928	-1.9103	-2.0059
0.4884	0.58	2200	0.5280	-1.0512	-2.0196	0.7222	0.9684	3.3924	-0.9588	1.4867	-464.8056	-380.8995	-1.8417	-1.9363
0.5818	0.6	2300	0.5211	-0.8015	-1.7051	0.7341	0.9036	3.1585	-0.8482	1.3568	-433.3542	-355.9271	-1.9326	-2.0312
0.5482	0.63	2400	0.5219	-0.9343	-1.9391	0.7480	1.0048	3.6277	-0.9572	1.5466	-456.7522	-369.2106	-1.8999	-1.9991
0.5037	0.65	2500	0.5317	-1.1525	-2.3572	0.7421	1.2048	4.3551	-1.0954	1.8593	-498.5656	-391.0249	-1.8941	-1.9920
0.5798	0.68	2600	0.5216	-0.9988	-1.9851	0.7421	0.9863	3.4321	-0.9403	1.4911	-461.3539	-375.6569	-1.8757	-1.9715
0.5345	0.71	2700	0.5184	-0.9615	-1.9463	0.7460	0.9848	3.4272	-0.8991	1.4738	-457.4719	-371.9321	-1.9155	-2.0104
0.5459	0.73	2800	0.5204	-0.9480	-1.9066	0.7302	0.9585	3.3614	-0.9218	1.4681	-453.5023	-370.5847	-1.8986	-1.9935
0.5691	0.76	2900	0.5153	-0.9262	-1.8909	0.7460	0.9647	3.3023	-0.8737	1.4285	-451.9376	-368.4024	-1.9368	-2.0317
0.4368	0.79	3000	0.5151	-0.9833	-1.9341	0.7421	0.9508	3.2231	-0.8740	1.4069	-456.2547	-374.1131	-1.9140	-2.0063
0.5785	0.81	3100	0.5157	-0.9492	-1.9005	0.7440	0.9513	3.2197	-0.8687	1.4068	-452.8972	-370.7017	-1.9233	-2.0167
0.4767	0.84	3200	0.5158	-0.9477	-1.9018	0.7421	0.9541	3.2459	-0.8543	1.4107	-453.0181	-370.5468	-1.9409	-2.0342
0.5071	0.86	3300	0.5160	-0.9553	-1.9218	0.7460	0.9665	3.3145	-0.8641	1.4367	-455.0208	-371.3060	-1.9439	-2.0364
0.4958	0.89	3400	0.5163	-0.9540	-1.9349	0.7381	0.9809	3.3829	-0.8849	1.4645	-456.3347	-371.1840	-1.9500	-2.0430
0.5241	0.92	3500	0.5164	-0.9755	-1.9801	0.7401	1.0046	3.4804	-0.9045	1.5041	-460.8534	-373.3299	-1.9495	-2.0428
0.5055	0.94	3600	0.5165	-0.9793	-1.9820	0.7401	1.0027	3.4710	-0.9036	1.5012	-461.0404	-373.7104	-1.9513	-2.0443
0.5325	0.97	3700	0.5163	-0.9770	-1.9766	0.7381	0.9996	3.4555	-0.9011	1.4955	-460.5036	-373.4828	-1.9505	-2.0437
0.5533	0.99	3800	0.5163	-0.9794	-1.9794	0.7401	1.0000	3.4591	-0.9049	1.4974	-460.7866	-373.7226	-1.9503	-2.0433

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

eurus-dpo-qlora-uf-5e-6

eurus-dpo-qlora-uf-5e-6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for just1nseo/eurus-dpo-qlora-uf-5e-6

Dataset used to train just1nseo/eurus-dpo-qlora-uf-5e-6

Evaluation results