megatron.bert-large.bpe-64k-no_pretok.25k-steps

This BERT model was trained using the NeMo library. The size of the model is a regular bert-large. The model was trained on more than 245GB of data, consisting mostly of web-data and Swedish newspaper text curated by the National Library of Sweden.

Training was done for 25k training steps using a batch size of 8k.

The model has multiple sibling models trained on the same dataset using different tokenizers or more/less parameters:

Acknowledgements

The training was performed on the Luxembourg national supercomputer MeluXina. The authors gratefully acknowledge the LuxProvide teams for their expert support.

Downloads last month
17
Safetensors
Model size
369M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.