mboillet's picture
Update README.md
6fab556 verified
|
raw
history blame
2.92 kB
metadata
library_name: Spacy
license: mit
tags:
  - Spacy
  - Named entity recognition
metrics:
  - P
  - R
  - F1
language:
  - la
​version:
  - Spacy v2

Spacy - HOME-Alcar - Location

This model detects Location entities in Latin.

The model has been trained using the Spacy v2 library on the HOME-Alcar document annotations to detect the person and location entities. The model is compatible with version 2.3.5 of Spacy and incompatible with versions 3.x.x.

Evaluation results

The model achieves the following results on HOME-Alcar:

tag predicted matched Precision Recall F1 Support
PERS 18,915 18,706 0.989 0.996 0.992 18,783
LOC 27,541 27,165 0.986 0.987 0.987 27,528
All 46,456 45,871 0.987 0.990 0.989 46,311

How to use?

Please refer to the Spacy library page (https://pypi.org/project/spacy/2.3.5/) to use this model.

Cite us!

@inproceedings{10.1007/978-3-031-06555-2_29,
    author = {Monroc, Claire Bizon and Miret, Blanche and Bonhomme, Marie-Laurence and Kermorvant, Christopher},
    title = {A Comprehensive Study Of Open-Source Libraries For Named Entity Recognition On Handwritten Historical Documents},
    year = {2022},
    isbn = {978-3-031-06554-5},
    publisher = {Springer-Verlag},
    address = {Berlin, Heidelberg},
    url = {https://doi.org/10.1007/978-3-031-06555-2_29},
    doi = {10.1007/978-3-031-06555-2_29},
    abstract = {In this paper, we propose an evaluation of several state-of-the-art open-source natural language processing (NLP) libraries for named entity recognition (NER) on handwritten historical documents: spaCy, Stanza and Flair. The comparison is carried out on three low-resource multilingual datasets of handwritten historical documents: HOME (a multilingual corpus of medieval charters), Balsac (a corpus of parish records from Quebec), and Esposalles (a corpus of marriage records in Catalan). We study the impact of the document recognition processes (text line detection and handwriting recognition) on the performance of the NER. We show that current off-the-shelf NER libraries yield state-of-the-art results, even on low-resource languages or multilingual documents using multilingual models. We show, in an end-to-end evaluation, that text line detection errors have a greater impact than handwriting recognition errors. Finally, we also report state-of-the-art results on the public Esposalles dataset.},
    booktitle = {Document Analysis Systems: 15th IAPR International Workshop, DAS 2022, La Rochelle, France, May 22–25, 2022, Proceedings},
    pages = {429–444},
    numpages = {16},
    keywords = {Text line detection, Named entity recognition, Handwritten historical documents},
    location = {La Rochelle, France}
}