novakat commited on
Commit
c465f3d
1 Parent(s): 2a7dd79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -27,7 +27,7 @@ inference:
27
 
28
  ## Training data
29
 
30
- The underlying corpus, [NerKor+CARS-ONPP](https://github.com/novakat/NYTK-NerKor-Cars-OntoNotesPP), was derived from [NYTK-NerKor](https://github.com/nytud/NYTK-NerKor), a Hungarian gold standard named entity annotated corpus containing about 1 million tokens.
31
  It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
32
  While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
33
  The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups. The annotation was elaborated with further entity subtypes not present in the Ontonotes 5 annotation (see below).
@@ -68,7 +68,7 @@ Further subtypes of names of type `MISC`:
68
  | | |
69
  |-|-|
70
  |`AWARD`| Awards and prizes |
71
- | `CAR` | Cars and trucks |
72
  |`MEDIA`| Media outlets, TV channels, news portals|
73
  |`SMEDIA`| Social media platforms|
74
  |`PROJ`| Projects and initiatives |
 
27
 
28
  ## Training data
29
 
30
+ The underlying corpus, [NerKor+CARS-ONPP](https://github.com/ppke-nlpg/NYTK-NerKor-Cars-OntoNotesPP), was derived from [NYTK-NerKor](https://github.com/nytud/NYTK-NerKor), a Hungarian gold standard named entity annotated corpus containing about 1 million tokens.
31
  It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
32
  While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
33
  The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups. The annotation was elaborated with further entity subtypes not present in the Ontonotes 5 annotation (see below).
 
68
  | | |
69
  |-|-|
70
  |`AWARD`| Awards and prizes |
71
+ | `CAR` | Cars and other motor vehicles |
72
  |`MEDIA`| Media outlets, TV channels, news portals|
73
  |`SMEDIA`| Social media platforms|
74
  |`PROJ`| Projects and initiatives |