Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,22 @@ license: mit
|
|
3 |
tags:
|
4 |
- flair
|
5 |
- text-generation
|
6 |
-
widget:
|
7 |
-
- text: "My name is Julien and I like to"
|
8 |
-
example_title: "Julien"
|
9 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
tags:
|
4 |
- flair
|
5 |
- text-generation
|
|
|
|
|
|
|
6 |
---
|
7 |
+
|
8 |
+
# Turmbücher LM
|
9 |
+
|
10 |
+
This repository contains the language models (forward & backward) that were used to train the [Turmbücher NER](https://huggingface.co/dh-unibe/turmbuecher-ner-v1/edit/main/README.md).
|
11 |
+
|
12 |
+
Two models for premodern German trained by Ismail Prada Ziegler as part of a research project at the University of Bern, Digital Humanities.
|
13 |
+
|
14 |
+
We recommend using flairs stacked embeddings for best effect.
|
15 |
+
|
16 |
+
## Data Set
|
17 |
+
|
18 |
+
Main data set: [Berner Turmbücher](https://www.polit-forum-bern.ch/turmbuecher/), early volumes from 16th C., Early New High German, 61k tokens training data.
|
19 |
+
|
20 |
+
Secondary data sets:
|
21 |
+
- [SSRQ](https://www.ssrq-sds-fds.ch/home/) - Fribourg, 59k tokens.
|
22 |
+
- [Chorgerichtsmanuale](https://www.adfontes.uzh.ch/370540/training/deutsche-transkriptionsuebungen/chorgerichtsmanuale-einleitung) (unpublished), 76k tokens.
|
23 |
+
- [Königsfelden Charters](https://www.koenigsfelden.uzh.ch/), 623k tokens.
|
24 |
+
- Talgerichtsprotokolle (unpublished), 438k tokens.
|