hbertv1-massive-intermediate_KD_new

This model is a fine-tuned version of gokuls/bert_12_layer_model_v1_complete_training_new_48 on the massive dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.2865	1.0	180	4.1021	0.1692
4.1098	2.0	360	3.6293	0.2494
3.6635	3.0	540	3.1836	0.3665
3.311	4.0	720	2.9568	0.4555
3.0266	5.0	900	2.7684	0.4791
2.8087	6.0	1080	2.5803	0.5903
2.6276	7.0	1260	2.4481	0.6335
2.4728	8.0	1440	2.3491	0.6763
2.3497	9.0	1620	2.3474	0.6508
2.2557	10.0	1800	2.3618	0.6945
2.1673	11.0	1980	2.1769	0.7324
2.0929	12.0	2160	2.2181	0.7177
2.0125	13.0	2340	2.0942	0.7659
1.9507	14.0	2520	2.0009	0.7767
1.8811	15.0	2700	2.0316	0.7624
1.8356	16.0	2880	2.0107	0.7698
1.7935	17.0	3060	1.9687	0.7742
1.7436	18.0	3240	1.9601	0.7811
1.7158	19.0	3420	1.9357	0.7836
1.6848	20.0	3600	1.9413	0.7747
1.6421	21.0	3780	1.9428	0.7723
1.6091	22.0	3960	1.8787	0.7944
1.5758	23.0	4140	1.8953	0.7831
1.5557	24.0	4320	1.8503	0.7964
1.5249	25.0	4500	1.8481	0.7939
1.5082	26.0	4680	1.8342	0.7983
1.4827	27.0	4860	1.7922	0.7993
1.4552	28.0	5040	1.7805	0.7988
1.4296	29.0	5220	1.7730	0.7988
1.4067	30.0	5400	1.7724	0.7993
1.3843	31.0	5580	1.7438	0.8032
1.3721	32.0	5760	1.7842	0.7954
1.358	33.0	5940	1.7238	0.8087
1.3332	34.0	6120	1.6919	0.8091
1.3211	35.0	6300	1.7014	0.8042
1.3063	36.0	6480	1.6718	0.8131
1.2863	37.0	6660	1.6631	0.8165
1.2753	38.0	6840	1.6867	0.8091
1.2651	39.0	7020	1.6675	0.8067
1.2475	40.0	7200	1.6524	0.8072
1.2343	41.0	7380	1.6218	0.8165
1.2223	42.0	7560	1.6201	0.8155