Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ inference:
|
|
27 |
|
28 |
## Training data
|
29 |
|
30 |
-
The underlying corpus, [NerKor+CARS-ONPP](https://github.com/
|
31 |
It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
|
32 |
While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
|
33 |
The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups. The annotation was elaborated with further entity subtypes not present in the Ontonotes 5 annotation (see below).
|
@@ -68,7 +68,7 @@ Further subtypes of names of type `MISC`:
|
|
68 |
| | |
|
69 |
|-|-|
|
70 |
|`AWARD`| Awards and prizes |
|
71 |
-
| `CAR` | Cars and
|
72 |
|`MEDIA`| Media outlets, TV channels, news portals|
|
73 |
|`SMEDIA`| Social media platforms|
|
74 |
|`PROJ`| Projects and initiatives |
|
|
|
27 |
|
28 |
## Training data
|
29 |
|
30 |
+
The underlying corpus, [NerKor+CARS-ONPP](https://github.com/ppke-nlpg/NYTK-NerKor-Cars-OntoNotesPP), was derived from [NYTK-NerKor](https://github.com/nytud/NYTK-NerKor), a Hungarian gold standard named entity annotated corpus containing about 1 million tokens.
|
31 |
It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
|
32 |
While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
|
33 |
The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups. The annotation was elaborated with further entity subtypes not present in the Ontonotes 5 annotation (see below).
|
|
|
68 |
| | |
|
69 |
|-|-|
|
70 |
|`AWARD`| Awards and prizes |
|
71 |
+
| `CAR` | Cars and other motor vehicles |
|
72 |
|`MEDIA`| Media outlets, TV channels, news portals|
|
73 |
|`SMEDIA`| Social media platforms|
|
74 |
|`PROJ`| Projects and initiatives |
|