bert-base-multilingual-cased-edda-domain-classification

This model is designed to classify encyclopedia articles into knowledge domains (e.g., History, Geography, Medicine, ...). It is a fine-tuned version of the bert-base-multilingual-cased model. It has been trained on the French Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772) edited by Diderot and d'Alembert (provided by the ARTFL Encyclopédie Project).

Model Description

Developed by: Alice Brenon, Ludovic Moncla, Katherine McDonough, and Khaled Chabane in the framework of the GEODE project.
Model type: Text classification
Repository: https://gitlab.liris.cnrs.fr/geode/EDdA-Classification/
Language(s) (NLP): French
License: cc-by-nc-4.0

Class labels

%TODO

Bias, Risks, and Limitations

This model was trained entirely on French encyclopaedic entries and will likely not perform well on text in other languages or other corpora.

Cite this work

Brenon, A., Moncla, L., & McDonough, K. (2022). Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions. Data & Knowledge Engineering, 142, 102098.

Acknowledgement

The authors are grateful to the ASLAN project (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR). Data courtesy the ARTFL Encyclopédie Project, University of Chicago.