Edit model card

megatron.bert-large.unigram-64k-pretok.500k-steps

This BERT model was trained using the NeMo library. The size of the model is a regular bert-large. The model was trained on more than 245GB of data, consisting mostly of web-data and Swedish newspaper text curated by the National Library of Sweden.

Training was done for 500k training steps using a batch size of 8k.

The model has multiple sibling models trained on the same dataset using different tokenizers or more/less parameters:

Acknowledgements

The training was performed on the Luxembourg national supercomputer MeluXina. The authors gratefully acknowledge the LuxProvide teams for their expert support.

Downloads last month
7