README.md · Babelscape/wikineural-multilingual-ner at 89ab4613336445bde46866ddc825561fe69e6e6c

metadata

annotations_creators:
  - machine-generated
language_creators:
  - machine-generated
tags:
  - named-entity-recognition
datasets:
  - Babelscape/wikineural
language:
  - de
  - en
  - es
  - fr
  - it
  - nl
  - pl
  - pt
  - ru
license:
  - cc-by-nc-sa-4.0
pretty_name: wikineural-dataset
source_datasets:
  - original
task_categories:
  - structure-prediction
task_ids:
  - named-entity-recognition

Model Description

Summary: mBERT model fine-tuned for 3 epochs on the recently-introduced WikiNEuRal dataset for Multilingual NER. The system supports the 9 languages covered by WikiNEuRal (de, en, es, fr, it, nl, pl, pt, ru), and it was trained on all 9 languages jointly.
Official Repository: https://github.com/Babelscape/wikineural
Paper: https://aclanthology.org/wikineural

Licensing Information

Contents of this repository are restricted to only non-commercial research purposes under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). Copyright of the dataset contents and models belongs to the original copyright holders.

Citation Information

@inproceedings{tedeschi-etal-2021-wikineural-combined,
    title = "{W}iki{NE}u{R}al: {C}ombined Neural and Knowledge-based Silver Data Creation for Multilingual {NER}",
    author = "Tedeschi, Simone  and
      Maiorca, Valentino  and
      Campolungo, Niccol{\`o}  and
      Cecconi, Francesco  and
      Navigli, Roberto",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.215",
    pages = "2521--2533",
    abstract = "Multilingual Named Entity Recognition (NER) is a key intermediate task which is needed in many areas of NLP. In this paper, we address the well-known issue of data scarcity in NER, especially relevant when moving to a multilingual scenario, and go beyond current approaches to the creation of multilingual silver data for the task. We exploit the texts of Wikipedia and introduce a new methodology based on the effective combination of knowledge-based approaches and neural models, together with a novel domain adaptation technique, to produce high-quality training corpora for NER. We evaluate our datasets extensively on standard benchmarks for NER, yielding substantial improvements up to 6 span-based F1-score points over previous state-of-the-art systems for data creation.",
}