Update README.md
Browse files
README.md
CHANGED
@@ -10,8 +10,8 @@ model-index:
|
|
10 |
name: Historic Text Normalization (type-level)
|
11 |
type: translation
|
12 |
dataset:
|
13 |
-
name: DTA
|
14 |
-
type: aehrm/dtaec-
|
15 |
split: dev
|
16 |
metrics:
|
17 |
- name: Word Accuracy
|
@@ -34,7 +34,7 @@ Note: This model is part of a larger system, which uses an additional GPT-based
|
|
34 |
|
35 |
## Training and evaluation data
|
36 |
|
37 |
-
The model has been trained on the DTA-EC Parallel Corpus Lexicon ([aehrm/dtaec-lexica](https://huggingface.co/datasets/aehrm/dtaec-
|
38 |
|
39 |
Training was done on type-level, where, given the historic form of a type, the model must predict the corresponding normalized type *that appeared most frequent in the parallel corpus*.
|
40 |
|
|
|
10 |
name: Historic Text Normalization (type-level)
|
11 |
type: translation
|
12 |
dataset:
|
13 |
+
name: DTA EvalCorpus Lexicon
|
14 |
+
type: aehrm/dtaec-lexicon
|
15 |
split: dev
|
16 |
metrics:
|
17 |
- name: Word Accuracy
|
|
|
34 |
|
35 |
## Training and evaluation data
|
36 |
|
37 |
+
The model has been trained on the DTA-EC Parallel Corpus Lexicon ([aehrm/dtaec-lexica](https://huggingface.co/datasets/aehrm/dtaec-lexicon)), which is from a [parallel corpus](https://kaskade.dwds.de/~moocow/software/dtaec/) of the Deutsche Textarchiv (German Text Archive), who aligned historic prints of documents with their moden editions in contemporary orthography.
|
38 |
|
39 |
Training was done on type-level, where, given the historic form of a type, the model must predict the corresponding normalized type *that appeared most frequent in the parallel corpus*.
|
40 |
|