aapot commited on
Commit
f78faa9
1 Parent(s): 84a9528

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -9,7 +9,7 @@ datasets:
9
  - Finnish-NLP/mc4_fi_cleaned
10
  - wikipedia
11
  widget:
12
- - text: "Olipa kerran tekoäly"
13
 
14
  ---
15
 
@@ -87,7 +87,7 @@ As with all language models, it is hard to predict in advance how the Finnish GP
87
 
88
  ## Training data
89
 
90
- This Finnish GPT-2 model was pretrained on the combination of five datasets:
91
  - [mc4_fi_cleaned](https://huggingface.co/datasets/Finnish-NLP/mc4_fi_cleaned), the dataset mC4 is a multilingual colossal, cleaned version of Common Crawl's web crawl corpus. We used the Finnish subset of the mC4 dataset and further cleaned it with our own text data cleaning codes (check the dataset repo).
92
  - [wikipedia](https://huggingface.co/datasets/wikipedia) We used the Finnish subset of the wikipedia (August 2021) dataset
93
  - [Yle Finnish News Archive 2011-2018](http://urn.fi/urn:nbn:fi:lb-2017070501)
 
9
  - Finnish-NLP/mc4_fi_cleaned
10
  - wikipedia
11
  widget:
12
+ - text: "Tekstiä tuottava tekoäly on"
13
 
14
  ---
15
 
 
87
 
88
  ## Training data
89
 
90
+ This Finnish GPT-2 model was pretrained on the combination of six datasets:
91
  - [mc4_fi_cleaned](https://huggingface.co/datasets/Finnish-NLP/mc4_fi_cleaned), the dataset mC4 is a multilingual colossal, cleaned version of Common Crawl's web crawl corpus. We used the Finnish subset of the mC4 dataset and further cleaned it with our own text data cleaning codes (check the dataset repo).
92
  - [wikipedia](https://huggingface.co/datasets/wikipedia) We used the Finnish subset of the wikipedia (August 2021) dataset
93
  - [Yle Finnish News Archive 2011-2018](http://urn.fi/urn:nbn:fi:lb-2017070501)