aehrm commited on
Commit
7f548cc
1 Parent(s): 155bb31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -10,8 +10,8 @@ model-index:
10
  name: Historic Text Normalization (type-level)
11
  type: translation
12
  dataset:
13
- name: DTA-EC Lexicon (dev)
14
- type: aehrm/dtaec-lexica
15
  split: dev
16
  metrics:
17
  - name: Word Accuracy
@@ -34,7 +34,7 @@ Note: This model is part of a larger system, which uses an additional GPT-based
34
 
35
  ## Training and evaluation data
36
 
37
- The model has been trained on the DTA-EC Parallel Corpus Lexicon ([aehrm/dtaec-lexica](https://huggingface.co/datasets/aehrm/dtaec-lexica)), which is from a [parallel corpus](https://kaskade.dwds.de/~moocow/software/dtaec/) of the Deutsche Textarchiv (German Text Archive), who aligned historic prints of documents with their moden editions in contemporary orthography.
38
 
39
  Training was done on type-level, where, given the historic form of a type, the model must predict the corresponding normalized type *that appeared most frequent in the parallel corpus*.
40
 
 
10
  name: Historic Text Normalization (type-level)
11
  type: translation
12
  dataset:
13
+ name: DTA EvalCorpus Lexicon
14
+ type: aehrm/dtaec-lexicon
15
  split: dev
16
  metrics:
17
  - name: Word Accuracy
 
34
 
35
  ## Training and evaluation data
36
 
37
+ The model has been trained on the DTA-EC Parallel Corpus Lexicon ([aehrm/dtaec-lexica](https://huggingface.co/datasets/aehrm/dtaec-lexicon)), which is from a [parallel corpus](https://kaskade.dwds.de/~moocow/software/dtaec/) of the Deutsche Textarchiv (German Text Archive), who aligned historic prints of documents with their moden editions in contemporary orthography.
38
 
39
  Training was done on type-level, where, given the historic form of a type, the model must predict the corresponding normalized type *that appeared most frequent in the parallel corpus*.
40