***** Running training ***** Num examples = 6004 Num Epochs = 14 Instantaneous batch size per device = 16 Total train batch size (w. parallel, distributed & accumulation) = 16 Gradient Accumulation steps = 1 Total optimization steps = 5000 [2500/5000 12:15 < 12:15, 3.40 it/s, Epoch 6/14] Step Training Loss Validation Loss Precision Recall F1 Accuracy 100 No log 0.247325 0.912333 0.925744 0.918990 0.960895 200 No log 0.171694 0.930514 0.928760 0.929636 0.963143 300 No log 0.132045 0.935375 0.943837 0.939587 0.970515 400 No log 0.142074 0.936490 0.939314 0.937900 0.968141 500 0.245500 0.105783 0.949794 0.955522 0.952649 0.975887 600 0.245500 0.107380 0.948120 0.950622 0.949369 0.973138 700 0.245500 0.111011 0.951504 0.954014 0.952757 0.972889 800 0.245500 0.093002 0.947999 0.955145 0.951558 0.975387 900 0.245500 0.100926 0.956193 0.954391 0.955291 0.976262 1000 0.086800 0.090775 0.955263 0.957784 0.956522 0.976637 1100 0.086800 0.099250 0.953829 0.957784 0.955802 0.976137 1200 0.086800 0.088502 0.952327 0.956276 0.954298 0.976762 1300 0.086800 0.094135 0.957078 0.958161 0.957619 0.977011 1400 0.086800 0.099687 0.957768 0.957407 0.957587 0.975887 1500 0.056000 0.108563 0.958930 0.959291 0.959111 0.974888 1600 0.056000 0.101031 0.957784 0.957784 0.957784 0.976262 1700 0.056000 0.099654 0.960135 0.962307 0.961220 0.978386 1800 0.056000 0.106387 0.954118 0.956276 0.955196 0.975512 1900 0.056000 0.096317 0.953846 0.958161 0.955998 0.975762 2000 0.040000 0.094224 0.959444 0.963061 0.961249 0.977761 2100 0.040000 0.110398 0.956669 0.957030 0.956849 0.975262 2200 0.040000 0.096151 0.955706 0.959668 0.957683 0.977386 2300 0.040000 0.108148 0.945149 0.954768 0.949934 0.974513 2400 0.040000 0.109966 0.950991 0.958161 0.954563 0.976637 2500 0.030900 0.117515 0.947921 0.953637 0.950770 0.973888