Icelandic-Norwegian ELECTRA-Small

This model was pretrained on the following corpora:

The total size of the corpus after document-level deduplication and filtering was 7.41B tokens, split equally between the two languages. The model was trained using a WordPiece tokenizer with a vocabulary size of 64,105 for 1.1 million steps, and otherwise with default settings.

Acknowledgments

This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC).

This project was funded by the Language Technology Programme for Icelandic 2019-2023. The programme, which is managed and coordinated by Almannarómur, is funded by the Icelandic Ministry of Education, Science and Culture.

Downloads last month
4
Hosted inference API

Unable to determine this model’s pipeline type. Check the docs .

Datasets used to train jonfd/electra-small-is-no