Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,12 @@
|
|
1 |
--- |-
|
2 |
Based on Finnish pretrained T5 model version small-nl24
|
3 |
Train data
|
4 |
-
Around 300k samples from from following datasets
|
5 |
-
|
6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
--- |-
|
2 |
Based on Finnish pretrained T5 model version small-nl24
|
3 |
Train data
|
4 |
+
Around 300k samples from from following datasets
|
5 |
+
- [wikipedia](https://huggingface.co/datasets/wikipedia)
|
6 |
+
- [Yle Finnish News Archive 2011-2018](http://urn.fi/urn:nbn:fi:lb-2017070501)
|
7 |
+
- [Yle Finnish News Archive 2019-2020](http://urn.fi/urn:nbn:fi:lb-2021050401)
|
8 |
+
- [Finnish News Agency Archive (STT)](http://urn.fi/urn:nbn:fi:lb-2018121001)
|
9 |
+
- [The Suomi24 Sentences Corpus](http://urn.fi/urn:nbn:fi:lb-2020021803)
|
10 |
+
|
11 |
+
Tested with 1000 samples from the previous datasets Median CER 1.1% MEAN CER 4.2%
|
12 |
+
More detailed info coming later...
|