hbertv1-massive-logit_KD-tiny_ffn_1

This model is a fine-tuned version of gokuls/model_v1_complete_training_wt_init_48_tiny_freeze_new_ffn_1 on the massive dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.296	1.0	180	3.7345	0.2081
3.5386	2.0	360	3.0526	0.2730
2.9946	3.0	540	2.6051	0.3360
2.6126	4.0	720	2.2810	0.4215
2.3148	5.0	900	2.0377	0.4683
2.0838	6.0	1080	1.8401	0.5371
1.9016	7.0	1260	1.6686	0.6080
1.7431	8.0	1440	1.5358	0.6439
1.613	9.0	1620	1.4238	0.6886
1.4952	10.0	1800	1.3339	0.7127
1.4	11.0	1980	1.2511	0.7162
1.3069	12.0	2160	1.1877	0.7285
1.2288	13.0	2340	1.1277	0.7329
1.1684	14.0	2520	1.0877	0.7418
1.0971	15.0	2700	1.0285	0.7570
1.0424	16.0	2880	0.9811	0.7619
0.9865	17.0	3060	0.9552	0.7629
0.943	18.0	3240	0.9216	0.7742
0.9047	19.0	3420	0.8812	0.7762
0.857	20.0	3600	0.8619	0.7821
0.8274	21.0	3780	0.8326	0.7914
0.7955	22.0	3960	0.8086	0.7919
0.7618	23.0	4140	0.7861	0.7973
0.7356	24.0	4320	0.7750	0.7993
0.7109	25.0	4500	0.7580	0.8028
0.6872	26.0	4680	0.7430	0.8077
0.6683	27.0	4860	0.7417	0.8101
0.6503	28.0	5040	0.7132	0.8155
0.6279	29.0	5220	0.7100	0.8106
0.6168	30.0	5400	0.6991	0.8165
0.5981	31.0	5580	0.6935	0.8185
0.5816	32.0	5760	0.6843	0.8200
0.5746	33.0	5940	0.6795	0.8155
0.5602	34.0	6120	0.6775	0.8210
0.5525	35.0	6300	0.6683	0.8244
0.5403	36.0	6480	0.6641	0.8219
0.5289	37.0	6660	0.6598	0.8278
0.5245	38.0	6840	0.6546	0.8278
0.518	39.0	7020	0.6523	0.8259
0.5105	40.0	7200	0.6488	0.8283
0.4988	41.0	7380	0.6463	0.8278
0.4971	42.0	7560	0.6414	0.8308
0.491	43.0	7740	0.6376	0.8318
0.4901	44.0	7920	0.6395	0.8298
0.4846	45.0	8100	0.6348	0.8298
0.4805	46.0	8280	0.6357	0.8313
0.481	47.0	8460	0.6320	0.8313
0.4767	48.0	8640	0.6331	0.8342
0.474	49.0	8820	0.6319	0.8328
0.4765	50.0	9000	0.6318	0.8308