ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter2

This model is a fine-tuned version of davidberenstein1957/ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter1 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0162
Rewards/real: -8.1731
Rewards/generated: -31.3826
Rewards/accuracies: 0.9917
Rewards/margins: 23.2095
Logps/generated: -956.3063
Logps/real: -525.1735
Logits/generated: -1.5719
Logits/real: -1.7813

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.6097	0.04	25	0.4147	-0.6192	-1.4312	0.9250	0.8120	-656.7919	-449.6341	-2.0004	-2.0773
0.2137	0.08	50	0.1745	-2.0300	-5.0060	0.9519	2.9761	-692.5404	-463.7422	-1.9306	-2.0237
0.1292	0.12	75	0.1012	-2.8227	-7.4967	0.9685	4.6740	-717.4471	-471.6697	-1.8843	-1.9887
0.0665	0.16	100	0.0676	-3.2936	-9.3177	0.9778	6.0240	-735.6567	-476.3786	-1.8508	-1.9628
0.0429	0.21	125	0.0477	-3.7328	-11.2722	0.9824	7.5395	-755.2025	-480.7701	-1.8123	-1.9332
0.0299	0.25	150	0.0369	-4.2161	-13.2599	0.9870	9.0437	-775.0787	-485.6039	-1.7938	-1.9226
0.0252	0.29	175	0.0320	-4.7201	-15.0489	0.9880	10.3288	-792.9691	-490.6432	-1.7758	-1.9116
0.0249	0.33	200	0.0301	-5.0757	-16.3570	0.9880	11.2813	-806.0497	-494.1995	-1.7515	-1.8923
0.0175	0.37	225	0.0273	-5.4299	-17.6751	0.9880	12.2451	-819.2310	-497.7419	-1.7362	-1.8821
0.0183	0.41	250	0.0254	-5.4183	-18.3899	0.9889	12.9715	-826.3791	-497.6259	-1.7300	-1.8793
0.0182	0.45	275	0.0245	-6.0900	-20.5760	0.9889	14.4860	-848.2401	-504.3426	-1.6961	-1.8564
0.0253	0.49	300	0.0224	-5.9239	-20.7184	0.9898	14.7944	-849.6640	-502.6819	-1.6938	-1.8573
0.0075	0.53	325	0.0234	-7.0436	-24.1126	0.9898	17.0691	-883.6064	-513.8781	-1.6522	-1.8252
0.0141	0.58	350	0.0212	-5.5696	-20.9714	0.9898	15.4017	-852.1937	-499.1387	-1.7082	-1.8693
0.0135	0.62	375	0.0182	-5.2646	-20.3901	0.9907	15.1254	-846.3809	-496.0890	-1.7285	-1.8897
0.014	0.66	400	0.0182	-5.5057	-21.1579	0.9907	15.6522	-854.0594	-498.4994	-1.7137	-1.8783
0.0122	0.7	425	0.0172	-5.3398	-20.7520	0.9907	15.4122	-849.9997	-496.8405	-1.7231	-1.8857
0.0144	0.74	450	0.0164	-4.6606	-19.3766	0.9917	14.7160	-836.2463	-490.0483	-1.7465	-1.9042
0.0103	0.78	475	0.0160	-4.8739	-20.1058	0.9907	15.2319	-843.5385	-492.1819	-1.7445	-1.9064
0.0147	0.82	500	0.0156	-5.1220	-20.9607	0.9917	15.8387	-852.0875	-494.6623	-1.7434	-1.9092
0.0154	0.86	525	0.0155	-5.1481	-21.3994	0.9917	16.2513	-856.4740	-494.9235	-1.7357	-1.9040
0.0158	0.91	550	0.0151	-5.6088	-22.9532	0.9917	17.3444	-872.0123	-499.5304	-1.7139	-1.8881
0.0053	0.95	575	0.0149	-5.7209	-23.5217	0.9917	17.8008	-877.6972	-500.6515	-1.7113	-1.8888
0.008	0.99	600	0.0147	-5.7523	-23.7474	0.9917	17.9952	-879.9544	-500.9651	-1.7086	-1.8878
0.0049	1.03	625	0.0154	-6.1839	-24.8883	0.9907	18.7044	-891.3632	-505.2818	-1.6731	-1.8585
0.0057	1.07	650	0.0155	-6.4947	-25.8924	0.9917	19.3977	-901.4037	-508.3892	-1.6592	-1.8484
0.0076	1.11	675	0.0158	-6.8543	-26.9217	0.9917	20.0674	-911.6970	-511.9859	-1.6407	-1.8339
0.004	1.15	700	0.0158	-7.1325	-27.7743	0.9917	20.6418	-920.2236	-514.7678	-1.6269	-1.8236
0.0168	1.19	725	0.0157	-6.9019	-26.2791	0.9917	19.3772	-905.2711	-512.4611	-1.6566	-1.8448
0.0022	1.23	750	0.0163	-6.9586	-26.5145	0.9917	19.5559	-907.6251	-513.0281	-1.6533	-1.8423
0.0039	1.28	775	0.0165	-7.5386	-28.2224	0.9917	20.6837	-924.7038	-518.8289	-1.6369	-1.8327
0.002	1.32	800	0.0165	-7.6568	-28.6441	0.9907	20.9872	-928.9208	-520.0109	-1.6365	-1.8344
0.002	1.36	825	0.0165	-7.7989	-29.2028	0.9917	21.4038	-934.5078	-521.4318	-1.6348	-1.8352
0.0019	1.4	850	0.0165	-7.8978	-29.5958	0.9917	21.6980	-938.4382	-522.4203	-1.6166	-1.8169
0.0041	1.44	875	0.0162	-7.9696	-29.7930	0.9917	21.8234	-940.4100	-523.1380	-1.6165	-1.8176
0.0023	1.48	900	0.0164	-8.2086	-30.6909	0.9917	22.4823	-949.3892	-525.5286	-1.6045	-1.8093
0.0038	1.52	925	0.0166	-8.1217	-30.6727	0.9917	22.5510	-949.2076	-524.6597	-1.5919	-1.7978
0.0096	1.56	950	0.0162	-7.8257	-30.1144	0.9917	22.2887	-943.6237	-521.6992	-1.5909	-1.7956
0.0057	1.6	975	0.0166	-8.0335	-30.6654	0.9917	22.6319	-949.1342	-523.7775	-1.5854	-1.7919
0.0046	1.65	1000	0.0165	-8.1757	-31.0139	0.9917	22.8382	-952.6191	-525.2000	-1.5768	-1.7852
0.0009	1.69	1025	0.0165	-8.0553	-30.7565	0.9917	22.7012	-950.0453	-523.9951	-1.5757	-1.7830
0.002	1.73	1050	0.0164	-8.1838	-31.3365	0.9917	23.1528	-955.8453	-525.2800	-1.5692	-1.7790
0.0069	1.77	1075	0.0163	-8.1908	-31.4118	0.9917	23.2210	-956.5981	-525.3508	-1.5749	-1.7850
0.0029	1.81	1100	0.0166	-8.4138	-32.0830	0.9917	23.6692	-963.3098	-527.5802	-1.5624	-1.7752
0.0047	1.85	1125	0.0166	-8.4223	-32.1526	0.9917	23.7304	-964.0065	-527.6652	-1.5631	-1.7759
0.0037	1.89	1150	0.0163	-8.1563	-31.3209	0.9917	23.1646	-955.6895	-525.0057	-1.5739	-1.7832
0.0026	1.93	1175	0.0163	-8.2107	-31.5009	0.9917	23.2901	-957.4888	-525.5498	-1.5708	-1.7807
0.0058	1.98	1200	0.0162	-8.1731	-31.3826	0.9917	23.2095	-956.3063	-525.1735	-1.5719	-1.7813

Framework versions

Transformers 4.37.0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

davidberenstein1957
/

ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter2

ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for davidberenstein1957/ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter2

Evaluation results