MiniLMv2-L6-H384_R-fineweb-100k

This is a MiniLMv2 model continually pre-trained on an MLM task with the goal of improving downstream fine-tuning/performance:

Model description

This model is a fine-tuned version of nreimers/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large on the BEE-spoke-data/fineweb-100k_en-med dataset.

It achieves the following results on the evaluation set:

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy	Input Tokens Seen
4.6583	0.1208	150	4.5052	0.3406	9830400
4.5365	0.2415	300	4.3712	0.3525	19660800
4.4621	0.3623	450	4.2810	0.3575	29491200
4.4116	0.4831	600	4.2466	0.3615	39321600
4.3487	0.6038	750	4.1795	0.3661	49152000
4.338	0.7246	900	4.1874	0.3663	58982400
4.342	0.8454	1050	4.1475	0.3695	68812800
4.268	0.9661	1200	4.1215	0.3714	78643200
4.2185	1.0869	1350	4.1032	0.3725	88472576
4.2645	1.2077	1500	4.0859	0.3757	98302976
4.2542	1.3284	1650	4.0730	0.3750	108133376
4.2614	1.4492	1800	4.0682	0.3749	117963776
4.1928	1.5700	1950	4.0596	0.3758	127794176
4.1971	1.6907	2100	4.0505	0.3777	137624576
4.1966	1.8115	2250	4.0163	0.3787	147454976
4.16	1.9323	2400	4.0352	0.3774	157285376