Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: is
|
3 |
+
widget:
|
4 |
+
- text: Má bjóða þér <mask> í kvöld?
|
5 |
+
- text: Forseti <mask> er ágæt.
|
6 |
+
- text: Súpan var <mask> á bragðið.
|
7 |
+
tags:
|
8 |
+
- roberta
|
9 |
+
- icelandic
|
10 |
+
- masked-lm
|
11 |
+
- pytorch
|
12 |
+
license: agpl-3.0
|
13 |
+
---
|
14 |
+
|
15 |
+
# IceBERT-ic3
|
16 |
+
|
17 |
+
This model was trained with fairseq using the RoBERTa-base architecture. It is one of many models we have trained for Icelandic, see the paper referenced below for further details. The training data used is shown in the table below.
|
18 |
+
|
19 |
+
| Dataset | Size | Tokens |
|
20 |
+
|------------------------------------------------------|---------|--------|
|
21 |
+
| Icelandic Common Crawl Corpus (IC3) | 4.9 GB | 824M |
|
22 |
+
|
23 |
+
|
24 |
+
## Scitation
|
25 |
+
|
26 |
+
The model is described in this paper [https://arxiv.org/abs/2201.05601](https://arxiv.org/abs/2201.05601). Please cite the paper if you make use of the model.
|
27 |
+
|
28 |
+
```
|
29 |
+
@article{DBLP:journals/corr/abs-2201-05601,
|
30 |
+
author = {V{\'{e}}steinn Sn{\ae}bjarnarson and
|
31 |
+
Haukur Barri S{\'{\i}}monarson and
|
32 |
+
P{\'{e}}tur Orri Ragnarsson and
|
33 |
+
Svanhv{\'{\i}}t Lilja Ing{\'{o}}lfsd{\'{o}}ttir and
|
34 |
+
Haukur P{\'{a}}ll J{\'{o}}nsson and
|
35 |
+
Vilhj{\'{a}}lmur {\TH}orsteinsson and
|
36 |
+
Hafsteinn Einarsson},
|
37 |
+
title = {A Warm Start and a Clean Crawled Corpus - {A} Recipe for Good Language
|
38 |
+
Models},
|
39 |
+
journal = {CoRR},
|
40 |
+
volume = {abs/2201.05601},
|
41 |
+
year = {2022},
|
42 |
+
url = {https://arxiv.org/abs/2201.05601},
|
43 |
+
eprinttype = {arXiv},
|
44 |
+
eprint = {2201.05601},
|
45 |
+
timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
|
46 |
+
biburl = {https://dblp.org/rec/journals/corr/abs-2201-05601.bib},
|
47 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
48 |
+
}
|
49 |
+
```
|