Edit model card

bert-base-french-cased-edda-ner-levels

This model is designed to identify and classify Named Entity Recognition with the prefix IO. It has been trained on the French Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772) edited by Diderot and d'Alembert (provided by the ARTFL Encyclopédie Project). Dataset: https://huggingface.co/datasets/GEODE/GeoEDdA

Class labels

The NER detected by this model are:

  • NC-Spatial: a common noun that identifies a spatial entity (nominal spatial entity) including natural features, e.g. ville, la rivière, royaume.
  • NP-Spatial: a proper noun identifying the name of a place (spatial named entities), e.g. France, Paris, la Chine.
  • ENE-Spatial: nested spatial entity , e.g. ville de France, royaume de Naples, la mer Baltique.
  • Relation: spatial relation, e.g. dans, sur, à 10 lieues de.
  • Latlong: geographic coordinates, e.g. Long. 19. 49. lat. 43. 55. 44.
  • NC-Person: a common noun that identifies a person (nominal spatial entity), e.g. roi, l'empereur, les auteurs.
  • NP-Person: a proper noun identifying the name of a person (person named entities), e.g. Louis XIV, Pline, les Romains.
  • ENE-Person: nested people entity, e.g. le czar Pierre, roi de Macédoine
  • NP-Misc: a proper noun identifying entities not classified as spatial or person, e.g. l'Eglise, 1702, Pélasgique.
  • ENE-Misc: nested named entity not classified as spatial or person, e.g. l'ordre de S. Jacques, la déclaration du 21 Mars 1671.
  • Head: entry name
  • Domain-Mark: words indicating the knowledge domain (usually after the head and between parenthesis), e.g. Géographie, Geog., en Anatomie.

Bias, Risks, and Limitations

This model was trained entirely on French encyclopedic entries and will likely not perform well on text in other languages or other corpora.

Acknowledgement

The authors are grateful to the ASLAN project (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR). Data courtesy the ARTFL Encyclopédie Project, University of Chicago.

Downloads last month
12
Safetensors
Model size
110M params
Tensor type
F32
·