***** Running training *****
  Num examples = 6004
  Num Epochs = 14
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 5000
 [2500/5000 12:15 < 12:15, 3.40 it/s, Epoch 6/14]
Step	Training Loss	Validation Loss	Precision	Recall	F1	Accuracy
100	No log	0.247325	0.912333	0.925744	0.918990	0.960895
200	No log	0.171694	0.930514	0.928760	0.929636	0.963143
300	No log	0.132045	0.935375	0.943837	0.939587	0.970515
400	No log	0.142074	0.936490	0.939314	0.937900	0.968141
500	0.245500	0.105783	0.949794	0.955522	0.952649	0.975887
600	0.245500	0.107380	0.948120	0.950622	0.949369	0.973138
700	0.245500	0.111011	0.951504	0.954014	0.952757	0.972889
800	0.245500	0.093002	0.947999	0.955145	0.951558	0.975387
900	0.245500	0.100926	0.956193	0.954391	0.955291	0.976262
1000	0.086800	0.090775	0.955263	0.957784	0.956522	0.976637
1100	0.086800	0.099250	0.953829	0.957784	0.955802	0.976137
1200	0.086800	0.088502	0.952327	0.956276	0.954298	0.976762
1300	0.086800	0.094135	0.957078	0.958161	0.957619	0.977011
1400	0.086800	0.099687	0.957768	0.957407	0.957587	0.975887
1500	0.056000	0.108563	0.958930	0.959291	0.959111	0.974888
1600	0.056000	0.101031	0.957784	0.957784	0.957784	0.976262
1700	0.056000	0.099654	0.960135	0.962307	0.961220	0.978386
1800	0.056000	0.106387	0.954118	0.956276	0.955196	0.975512
1900	0.056000	0.096317	0.953846	0.958161	0.955998	0.975762
2000	0.040000	0.094224	0.959444	0.963061	0.961249	0.977761
2100	0.040000	0.110398	0.956669	0.957030	0.956849	0.975262
2200	0.040000	0.096151	0.955706	0.959668	0.957683	0.977386
2300	0.040000	0.108148	0.945149	0.954768	0.949934	0.974513
2400	0.040000	0.109966	0.950991	0.958161	0.954563	0.976637
2500	0.030900	0.117515	0.947921	0.953637	0.950770	0.973888