aapot commited on
Commit
8b141e7
1 Parent(s): fa9a284

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,9 +8,9 @@ tags:
8
  datasets:
9
  - mc4
10
  - wikipedia
11
- pipeline_tag: fill-mask
12
  widget:
13
  - text: "Moikka olen <mask> kielimalli."
 
14
  ---
15
 
16
  # RoBERTa large model for Finnish
@@ -105,7 +105,7 @@ neutral. Therefore, the model can have biased predictions.
105
  ## Training data
106
 
107
  This Finnish RoBERTa model was pretrained on the combination of five datasets:
108
- - [mc4](https://huggingface.co/datasets/mc4), the dataset mC4 is a multilingual colossal, cleaned version of Common Crawl's web crawl corpus. Based on Common Crawl dataset. We used the Finnish subset of the mC4 dataset
109
  - [wikipedia](https://huggingface.co/datasets/wikipedia) We used the Finnish subset of the wikipedia (August 2021) dataset
110
  - [Yle Finnish News Archive](http://urn.fi/urn:nbn:fi:lb-2017070501)
111
  - [Finnish News Agency Archive (STT)](http://urn.fi/urn:nbn:fi:lb-2018121001)
 
8
  datasets:
9
  - mc4
10
  - wikipedia
 
11
  widget:
12
  - text: "Moikka olen <mask> kielimalli."
13
+
14
  ---
15
 
16
  # RoBERTa large model for Finnish
 
105
  ## Training data
106
 
107
  This Finnish RoBERTa model was pretrained on the combination of five datasets:
108
+ - [mc4](https://huggingface.co/datasets/mc4), the dataset mC4 is a multilingual colossal, cleaned version of Common Crawl's web crawl corpus. We used the Finnish subset of the mC4 dataset
109
  - [wikipedia](https://huggingface.co/datasets/wikipedia) We used the Finnish subset of the wikipedia (August 2021) dataset
110
  - [Yle Finnish News Archive](http://urn.fi/urn:nbn:fi:lb-2017070501)
111
  - [Finnish News Agency Archive (STT)](http://urn.fi/urn:nbn:fi:lb-2018121001)