eurus-dpo-qlora-uffull-5e-6

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.5127
Rewards/chosen: -0.9791
Rewards/rejected: -1.9966
Rewards/accuracies: 0.7540
Rewards/margins: 1.0174
Rewards/margins Max: 3.5694
Rewards/margins Min: -0.9504
Rewards/margins Std: 1.5237
Logps/rejected: -462.4769
Logps/chosen: -373.6858
Logits/rejected: -2.0066
Logits/chosen: -2.1034

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 16
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6864	0.03	100	0.6880	-0.0140	-0.0283	0.6329	0.0143	0.0966	-0.0527	0.0482	-265.6463	-277.1725	-2.2230	-2.3332
0.6729	0.05	200	0.6675	-0.1633	-0.2510	0.6627	0.0877	0.5034	-0.2742	0.2543	-287.9178	-292.1004	-2.1945	-2.3031
0.6516	0.08	300	0.6332	-0.2864	-0.4906	0.6905	0.2042	0.8657	-0.3947	0.4208	-311.8771	-304.4155	-2.1827	-2.2904
0.6259	0.1	400	0.6459	-1.4444	-2.0134	0.6488	0.5690	2.7419	-1.2404	1.3151	-464.1583	-420.2169	-2.0161	-2.1158
0.5981	0.13	500	0.5951	-0.4738	-0.8890	0.7004	0.4151	1.7169	-0.5423	0.7476	-351.7183	-323.1576	-2.0982	-2.2026
0.5825	0.16	600	0.6147	-1.4298	-2.1755	0.6766	0.7458	3.1883	-1.2023	1.4469	-480.3750	-418.7514	-1.9080	-2.0118
0.6157	0.18	700	0.5762	-1.0422	-1.6487	0.7044	0.6066	2.5214	-0.8306	1.1064	-427.6948	-379.9899	-1.8007	-1.8987
0.5937	0.21	800	0.5623	-0.6723	-1.2169	0.7242	0.5447	2.0184	-0.5908	0.8750	-384.5144	-343.0002	-1.9444	-2.0444
0.5394	0.24	900	0.5627	-1.0989	-1.9261	0.7302	0.8273	3.2426	-0.8732	1.3769	-455.4331	-385.6613	-2.0832	-2.1830
0.6262	0.26	1000	0.5604	-1.1248	-1.9857	0.7143	0.8609	3.4243	-0.9201	1.4521	-461.3933	-388.2573	-1.9102	-2.0114
0.5723	0.29	1100	0.5496	-0.7408	-1.5482	0.7381	0.8074	3.2334	-0.6981	1.3203	-417.6383	-349.8509	-1.9847	-2.0879
0.5501	0.31	1200	0.5542	-0.6061	-1.1959	0.7321	0.5899	2.1036	-0.5358	0.8885	-382.4131	-336.3819	-1.8930	-1.9914
0.5382	0.34	1300	0.5417	-1.1698	-2.0706	0.7460	0.9008	3.3611	-0.9081	1.4208	-469.8816	-392.7588	-1.7319	-1.8331
0.5759	0.37	1400	0.5406	-0.9231	-1.8635	0.7401	0.9404	3.5157	-0.8329	1.4521	-449.1679	-368.0823	-1.8351	-1.9399
0.5367	0.39	1500	0.5376	-0.8430	-1.7065	0.7560	0.8635	3.1796	-0.8328	1.3201	-433.4751	-360.0789	-1.8587	-1.9608
0.5345	0.42	1600	0.5269	-0.8832	-1.7856	0.7381	0.9024	3.3303	-0.8483	1.3858	-441.3758	-364.0924	-1.8133	-1.9167
0.5132	0.44	1700	0.5339	-1.0951	-2.0179	0.7540	0.9228	3.2850	-0.9130	1.4005	-464.6132	-385.2873	-1.8670	-1.9681
0.5451	0.47	1800	0.5310	-0.7777	-1.6911	0.7282	0.9135	3.4268	-0.8127	1.4169	-431.9351	-353.5432	-1.8431	-1.9515
0.5126	0.5	1900	0.5315	-1.0683	-2.0616	0.7302	0.9933	3.6236	-0.9938	1.5447	-468.9817	-382.6060	-1.8568	-1.9592
0.5173	0.52	2000	0.5273	-0.9246	-1.8103	0.7421	0.8857	3.2625	-0.9327	1.3899	-443.8511	-368.2305	-1.9264	-2.0273
0.5241	0.55	2100	0.5267	-1.0388	-2.0045	0.7262	0.9657	3.5894	-1.0169	1.5350	-463.2707	-379.6525	-1.9509	-2.0505
0.4912	0.58	2200	0.5236	-1.0773	-2.1473	0.7460	1.0699	3.9227	-1.0592	1.6634	-477.5478	-383.5082	-1.9172	-2.0173
0.5792	0.6	2300	0.5177	-0.8715	-1.7418	0.7361	0.8703	3.0821	-0.8725	1.3249	-436.9993	-362.9194	-2.0500	-2.1480
0.5628	0.63	2400	0.5218	-0.9891	-1.9917	0.7460	1.0026	3.6936	-1.0654	1.5794	-461.9902	-374.6792	-2.0218	-2.1218
0.5217	0.65	2500	0.5324	-1.2240	-2.4529	0.7480	1.2290	4.5548	-1.2387	1.9354	-508.1148	-398.1707	-1.9639	-2.0649
0.581	0.68	2600	0.5199	-0.9497	-1.9408	0.7381	0.9910	3.5052	-0.9698	1.5040	-456.8956	-370.7460	-1.9873	-2.0864
0.518	0.71	2700	0.5212	-1.0617	-2.1128	0.7401	1.0511	3.7114	-1.0556	1.6114	-474.0986	-381.9437	-1.9898	-2.0884
0.5646	0.73	2800	0.5173	-0.9139	-1.8873	0.7401	0.9734	3.4192	-0.9267	1.4687	-451.5462	-367.1606	-1.9649	-2.0632
0.5608	0.76	2900	0.5170	-1.0090	-2.0514	0.7421	1.0424	3.6819	-1.0248	1.5843	-467.9605	-376.6732	-1.9805	-2.0788
0.4166	0.79	3000	0.5134	-0.9849	-1.9772	0.7421	0.9923	3.4268	-0.9556	1.4828	-460.5416	-374.2640	-1.9769	-2.0737
0.5672	0.81	3100	0.5129	-0.9737	-1.9738	0.7520	1.0001	3.4737	-0.9442	1.4902	-460.2002	-373.1453	-1.9761	-2.0727
0.4843	0.84	3200	0.5127	-0.9899	-1.9951	0.7480	1.0053	3.4925	-0.9434	1.4955	-462.3347	-374.7598	-1.9879	-2.0844
0.5234	0.86	3300	0.5123	-0.9618	-1.9579	0.7480	0.9961	3.4685	-0.9316	1.4824	-458.6060	-371.9529	-2.0078	-2.1041
0.4751	0.89	3400	0.5128	-0.9715	-1.9858	0.7480	1.0143	3.5545	-0.9477	1.5159	-461.4002	-372.9207	-2.0063	-2.1035
0.5294	0.92	3500	0.5131	-0.9928	-2.0226	0.7460	1.0298	3.6184	-0.9685	1.5451	-465.0800	-375.0580	-2.0043	-2.1015
0.5066	0.94	3600	0.5129	-0.9814	-2.0001	0.75	1.0187	3.5761	-0.9557	1.5271	-462.8294	-373.9119	-2.0121	-2.1084
0.5396	0.97	3700	0.5126	-0.9787	-1.9952	0.7520	1.0165	3.5676	-0.9529	1.5231	-462.3404	-373.6405	-2.0075	-2.1043
0.5374	0.99	3800	0.5127	-0.9798	-1.9982	0.75	1.0185	3.5723	-0.9502	1.5244	-462.6427	-373.7504	-2.0092	-2.1060

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

eurus-dpo-qlora-uffull-5e-6

eurus-dpo-qlora-uffull-5e-6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Dataset used to train just1nseo/eurus-dpo-qlora-uffull-5e-6

Evaluation results

eurus-dpo-qlora-uffull-5e-6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for openbmb/Eurus-7b-sft

Dataset used to train just1nseo/eurus-dpo-qlora-uffull-5e-6

Evaluation results

Adapter for