czuk commited on
Commit
8d1e0dc
1 Parent(s): dbc7041

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -1
README.md CHANGED
@@ -63,4 +63,32 @@ print(lemmas)
63
 
64
  # Citation
65
 
66
- *Will appear soon*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
  # Citation
65
 
66
+ ```latex
67
+ @inproceedings{piskorski-etal-2024-cross-lingual,
68
+ title = "Cross-lingual Named Entity Corpus for {S}lavic Languages",
69
+ author = "Piskorski, Jakub and
70
+ Marci{\'n}czuk, Micha{\l} and
71
+ Yangarber, Roman",
72
+ editor = "Calzolari, Nicoletta and
73
+ Kan, Min-Yen and
74
+ Hoste, Veronique and
75
+ Lenci, Alessandro and
76
+ Sakti, Sakriani and
77
+ Xue, Nianwen",
78
+ booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
79
+ month = may,
80
+ year = "2024",
81
+ address = "Torino, Italy",
82
+ publisher = "ELRA and ICCL",
83
+ url = "https://aclanthology.org/2024.lrec-main.369",
84
+ pages = "4143--4157",
85
+ abstract = "This paper presents a corpus manually annotated with named entities for six Slavic languages {---} Bulgarian, Czech, Polish, Slovenian, Russian,
86
+ and Ukrainian. This work is the result of a series of shared tasks, conducted in 2017{--}2023 as a part of the Workshops on Slavic Natural
87
+ Language Processing. The corpus consists of 5,017 documents on seven topics. The documents are annotated with five classes of named entities.
88
+ Each entity is described by a category, a lemma, and a unique cross-lingual identifier. We provide two train-tune dataset splits
89
+ {---} single topic out and cross topics. For each split, we set benchmarks using a transformer-based neural network architecture
90
+ with the pre-trained multilingual models {---} XLM-RoBERTa-large for named entity mention recognition and categorization,
91
+ and mT5-large for named entity lemmatization and linking.",
92
+ }
93
+ ```
94
+