roberta-kubhist2 / README.md
drvenabili's picture
Update README.md
eb135ad
|
raw
history blame
914 Bytes
metadata
widget:
  - text: Simon dog i <mask> i går.
license: mit
datasets:
  - ChangeIsKey/kubhist2
language:
  - sv
library_name: transformers

This is a roberta model trained on kubhist2 (https://spraakbanken.gu.se/en/resources/kubhist2, https://spraakbanken.gu.se/blogg/index.php/2019/09/15/the-kubhist-corpus-of-swedish-newspapers/). For a HF version of kubhist2, see here: https://huggingface.co/datasets/ChangeIsKey/kubhist2

This is a work in progress, the quality of the model -- just like the quality of the training data -- is far from great.

Shared here with no guarantee whatsoever, will likely change, use at your own risk, etc.

Discussion of Biases

This is trained on historical data. As such, outdated views might be present in the data.

Other Known Limitations

The data comes from an OCR process. The text is thus not perfect, especially so in the earlier decades.

Contact

Simon Hengchen