hbertv1-massive-logit_KD-tiny_ffn_2

This model is a fine-tuned version of gokuls/model_v1_complete_training_wt_init_48_tiny_freeze_new_ffn_2 on the massive dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.1929	1.0	180	3.5935	0.1402
3.4611	2.0	360	3.0049	0.2941
2.9024	3.0	540	2.4730	0.3792
2.4356	4.0	720	2.0721	0.4515
2.1041	5.0	900	1.8179	0.5278
1.8564	6.0	1080	1.6004	0.6257
1.6676	7.0	1260	1.4500	0.6596
1.5135	8.0	1440	1.3147	0.6995
1.3906	9.0	1620	1.2211	0.7147
1.2811	10.0	1800	1.1393	0.7314
1.1937	11.0	1980	1.0803	0.7304
1.112	12.0	2160	1.0267	0.7467
1.0488	13.0	2340	0.9716	0.7570
0.983	14.0	2520	0.9306	0.7649
0.9294	15.0	2700	0.8892	0.7767
0.8909	16.0	2880	0.8578	0.7885
0.8436	17.0	3060	0.8270	0.7909
0.8078	18.0	3240	0.8201	0.7964
0.7777	19.0	3420	0.7934	0.8028
0.7433	20.0	3600	0.7792	0.8037
0.7121	21.0	3780	0.7504	0.8082
0.6896	22.0	3960	0.7433	0.8091
0.6592	23.0	4140	0.7200	0.8160
0.6389	24.0	4320	0.7177	0.8096
0.6175	25.0	4500	0.7039	0.8136
0.6024	26.0	4680	0.6928	0.8180
0.5835	27.0	4860	0.6940	0.8170
0.5673	28.0	5040	0.6787	0.8136
0.5523	29.0	5220	0.6680	0.8229
0.5445	30.0	5400	0.6599	0.8234
0.5319	31.0	5580	0.6634	0.8214
0.5196	32.0	5760	0.6549	0.8259
0.504	33.0	5940	0.6506	0.8239
0.4993	34.0	6120	0.6518	0.8249
0.4941	35.0	6300	0.6388	0.8239
0.4823	36.0	6480	0.6317	0.8278
0.4734	37.0	6660	0.6327	0.8288
0.4609	38.0	6840	0.6312	0.8239
0.4617	39.0	7020	0.6279	0.8288
0.4529	40.0	7200	0.6255	0.8273
0.4491	41.0	7380	0.6173	0.8288
0.4419	42.0	7560	0.6148	0.8313
0.4378	43.0	7740	0.6208	0.8298
0.4362	44.0	7920	0.6140	0.8288
0.432	45.0	8100	0.6152	0.8308
0.4276	46.0	8280	0.6150	0.8288
0.4263	47.0	8460	0.6118	0.8308