This model follows the Bert Large model architecture as implemented in Megatron-LM framework. It was trained with a batch size of 512 in 600k steps. The model contains following parameters:
The model is pretrained on a Swedish text corpus of around 85 GB from a variety of sources as shown below.
The raw model can be used for the usual tasks of masked language modeling or next sentence prediction. It is also often fine-tuned on a downstream task to improve its performance in a specific domain/task.
from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("AI-Nordics/bert-large-swedish-cased") model = AutoModelForMaskedLM.from_pretrained("AI-Nordics/bert-large-swedish-cased")
- Downloads last month