|
--- |- |
|
Based on Finnish pretrained T5 model version small-nl24 |
|
Train data |
|
Around 300k samples from from following datasets |
|
- [wikipedia](https://huggingface.co/datasets/wikipedia) |
|
- [Yle Finnish News Archive 2011-2018](http://urn.fi/urn:nbn:fi:lb-2017070501) |
|
- [Yle Finnish News Archive 2019-2020](http://urn.fi/urn:nbn:fi:lb-2021050401) |
|
- [Finnish News Agency Archive (STT)](http://urn.fi/urn:nbn:fi:lb-2018121001) |
|
- [The Suomi24 Sentences Corpus](http://urn.fi/urn:nbn:fi:lb-2020021803) |
|
|
|
Tested with 1000 samples from the previous datasets Median CER 1.1% MEAN CER 4.2% |
|
More detailed info coming later... |
|
|