# IceBERT-igc

This model was trained with fairseq using the RoBERTa-base architecture. It is one of many models we have trained for Icelandic, see the paper referenced below for further details. The training data used is shown in the table below.

Dataset Size Tokens
Icelandic Gigaword Corpus v20.05 (IGC) 8.2 GB 1,388M

## Citation

The model is described in this paper https://arxiv.org/abs/2201.05601. Please cite the paper if you make use of the model.

Mask token: <mask>