novakat commited on
Commit
c3a1e3c
1 Parent(s): c465f3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -27,7 +27,7 @@ inference:
27
 
28
  ## Training data
29
 
30
- The underlying corpus, [NerKor+CARS-ONPP](https://github.com/ppke-nlpg/NYTK-NerKor-Cars-OntoNotesPP), was derived from [NYTK-NerKor](https://github.com/nytud/NYTK-NerKor), a Hungarian gold standard named entity annotated corpus containing about 1 million tokens.
31
  It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
32
  While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
33
  The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups. The annotation was elaborated with further entity subtypes not present in the Ontonotes 5 annotation (see below).
@@ -95,6 +95,6 @@ Further non-name entities:
95
  address = {Marseille, France},
96
  publisher = {European Language Resources Association},
97
  pages = {1907--1916},
98
- url = {https://aclanthology.org/2022.lrec-1.203}
99
  }
100
  ```
 
27
 
28
  ## Training data
29
 
30
+ The underlying corpus, [NerKor+CARS-OntoNotes++](https://github.com/ppke-nlpg/NYTK-NerKor-Cars-OntoNotesPP), was derived from [NYTK-NerKor](https://github.com/nytud/NYTK-NerKor), a Hungarian gold standard named entity annotated corpus containing about 1 million tokens.
31
  It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
32
  While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
33
  The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups. The annotation was elaborated with further entity subtypes not present in the Ontonotes 5 annotation (see below).
 
95
  address = {Marseille, France},
96
  publisher = {European Language Resources Association},
97
  pages = {1907--1916},
98
+ url = {http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.203.pdf}
99
  }
100
  ```