novakat commited on
Commit
3195d9d
1 Parent(s): 5778f48

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -25,10 +25,12 @@ inference:
25
 
26
  - max_seq_length = 448
27
 
28
- The underlying corpus [NerKor+CARS-ONPP](https://github.com/novakat/NYTK-NerKor-Cars-OntoNotesPP) was derived from [NYTK-NerKor](https://github.com/nytud/NYTK-NerKor), a Hungarian gold standard named entity annotated corpus containing about 1 million tokens.
 
 
29
  It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
30
  While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
31
- The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups.
32
 
33
  ## Tags derived from the OntoNotes 5.0 annotation
34
 
 
25
 
26
  - max_seq_length = 448
27
 
28
+ ## Training data
29
+
30
+ The underlying corpus, [NerKor+CARS-ONPP](https://github.com/novakat/NYTK-NerKor-Cars-OntoNotesPP), was derived from [NYTK-NerKor](https://github.com/nytud/NYTK-NerKor), a Hungarian gold standard named entity annotated corpus containing about 1 million tokens.
31
  It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
32
  While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
33
+ The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups. The annotation was elaborated with further entity subtypes not present in the Ontonotes 5 annotation (see below).
34
 
35
  ## Tags derived from the OntoNotes 5.0 annotation
36