File size: 2,693 Bytes
9b9168c 40f8325 3c09bd7 9b9168c 3c09bd7 9b9168c 180823e 9b9168c fd6440b 9b9168c 3c0cfca 9b9168c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
annotations_creators:
- machine-generated
language_creators:
- machine-generated
tags:
- seq2seq
- relation-extraction
datasets:
- Babelscape/wikineural
language:
- de
- en
- es
- fr
- it
- nl
- pl
- pt
- ru
license:
- cc-by-nc-sa-4.0
pretty_name: wikineural-dataset
source_datasets:
- original
task_categories:
- structure-prediction
task_ids:
- named-entity-recognition
---
## Model Description
- **Summary:** mBERT model fine-tuned for 3 epochs on the recently-introduced WikiNEuRal dataset for Multilingual NER. The system supports the 9 languages covered by WikiNEuRal, and it was trained on all 9 languages jointly.
- **Official Repository:** [https://github.com/Babelscape/wikineural](https://github.com/Babelscape/wikineural)
- **Paper:** [https://aclanthology.org/wikineural](https://aclanthology.org/2021.findings-emnlp.215/)
## Licensing Information
Contents of this repository are restricted to only non-commercial research purposes under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright of the dataset contents and models belongs to the original copyright holders.
## Citation Information
```bibtex
@inproceedings{tedeschi-etal-2021-wikineural-combined,
title = "{W}iki{NE}u{R}al: {C}ombined Neural and Knowledge-based Silver Data Creation for Multilingual {NER}",
author = "Tedeschi, Simone and
Maiorca, Valentino and
Campolungo, Niccol{\`o} and
Cecconi, Francesco and
Navigli, Roberto",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-emnlp.215",
pages = "2521--2533",
abstract = "Multilingual Named Entity Recognition (NER) is a key intermediate task which is needed in many areas of NLP. In this paper, we address the well-known issue of data scarcity in NER, especially relevant when moving to a multilingual scenario, and go beyond current approaches to the creation of multilingual silver data for the task. We exploit the texts of Wikipedia and introduce a new methodology based on the effective combination of knowledge-based approaches and neural models, together with a novel domain adaptation technique, to produce high-quality training corpora for NER. We evaluate our datasets extensively on standard benchmarks for NER, yielding substantial improvements up to 6 span-based F1-score points over previous state-of-the-art systems for data creation.",
}
```
|