File size: 1,757 Bytes
cb184e5
 
 
48b661a
 
 
cb184e5
48b661a
 
 
 
c71cced
cb184e5
 
7e0f4b8
cb184e5
7e0f4b8
 
 
 
 
 
 
4b43558
7e0f4b8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cb184e5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
language: is
widget:
- text:  bjóða þér <mask> í kvöld?
- text: Forseti <mask> er ágæt.
- text: Súpan var <mask> á bragðið.
tags:
- roberta
- icelandic
- masked-lm
- pytorch
license: agpl-3.0
---

# IceBERT-igc

This model was trained with fairseq using the RoBERTa-base architecture. It is one of many models we have trained for Icelandic, see the paper referenced below for further details. The training data used is shown in the table below.

| Dataset                                              | Size    | Tokens |
|------------------------------------------------------|---------|--------|
| Icelandic Gigaword Corpus v20.05 (IGC)               | 8.2 GB  | 1,388M |


## Citation

The model is described in this paper [https://arxiv.org/abs/2201.05601](https://arxiv.org/abs/2201.05601). Please cite the paper if you make use of the model.

```
@article{DBLP:journals/corr/abs-2201-05601,
  author    = {V{\'{e}}steinn Sn{\ae}bjarnarson and
               Haukur Barri S{\'{\i}}monarson and
               P{\'{e}}tur Orri Ragnarsson and
               Svanhv{\'{\i}}t Lilja Ing{\'{o}}lfsd{\'{o}}ttir and
               Haukur P{\'{a}}ll J{\'{o}}nsson and
               Vilhj{\'{a}}lmur {\TH}orsteinsson and
               Hafsteinn Einarsson},
  title     = {A Warm Start and a Clean Crawled Corpus - {A} Recipe for Good Language
               Models},
  journal   = {CoRR},
  volume    = {abs/2201.05601},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.05601},
  eprinttype = {arXiv},
  eprint    = {2201.05601},
  timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2201-05601.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
```