This is a MicroBERT model for Tamil.

Its suffix is -m, which means that it was pretrained using supervision from masked language modeling.
The unlabeled Tamil data was taken from a June 2022 dump of Tamil Wikipedia, downsampled to 1,429,735 tokens.
The UD treebank UD_Tamil-TTB, v2.9, totaling 9,581 tokens, was used for labeled data.

Please see the repository and the paper for more details.

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.