Edit model card

bert-base-french-cased-edda-ner

This model is designed to identify and classify Named Entity Recognition. It has been trained on the French Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772) edited by Diderot and d'Alembert (provided by the ARTFL Encyclopédie Project). Dataset: https://huggingface.co/datasets/GEODE/GeoEDdA

Class labels

The NER detected by this model are:

  • NC-Spatial: a common noun that identifies a spatial entity (nominal spatial entity) including natural features, e.g. ville, la rivière, royaume.
  • NP-Spatial: a proper noun identifying the name of a place (spatial named entities), e.g. France, Paris, la Chine.
  • Relation: spatial relation, e.g. dans, sur, à 10 lieues de.
  • Latlong: geographic coordinates, e.g. Long. 19. 49. lat. 43. 55. 44.
  • NC-Person: a common noun that identifies a person (nominal spatial entity), e.g. roi, l'empereur, les auteurs.
  • NP-Person: a proper noun identifying the name of a person (person named entities), e.g. Louis XIV, Pline, les Romains.
  • NP-Misc: a proper noun identifying entities not classified as spatial or person, e.g. l'Eglise, 1702, Pélasgique.
  • Head: entry name
  • Domain-Mark: words indicating the knowledge domain (usually after the head and between parenthesis), e.g. Géographie, Geog., en Anatomie.

Bias, Risks, and Limitations

This model was trained entirely on French encyclopedic entries and will likely not perform well on text in other languages or other corpora.

Acknowledgement

The authors are grateful to the ASLAN project (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR). Data courtesy the ARTFL Encyclopédie Project, University of Chicago.

Downloads last month
16
Safetensors
Model size
110M params
Tensor type
F32
·