bert-base-multilingual-cased-edda-domain-classification
This model is designed to classify encyclopedia articles into knowledge domains (e.g., History, Geography, Medicine, ...). It is a fine-tuned version of the bert-base-multilingual-cased model. It has been trained on the French Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772) edited by Diderot and d'Alembert (provided by the ARTFL Encyclopédie Project).
Model Description
- Developed by: Alice Brenon, Ludovic Moncla, Katherine McDonough, and Khaled Chabane in the framework of the GEODE project.
- Model type: Text classification
- Repository: https://gitlab.liris.cnrs.fr/geode/EDdA-Classification/
- Language(s) (NLP): French
- License: cc-by-nc-4.0
Class labels
%TODO
Bias, Risks, and Limitations
This model was trained entirely on French encyclopaedic entries and will likely not perform well on text in other languages or other corpora.
Cite this work
Brenon, A., Moncla, L., & McDonough, K. (2022). Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions. Data & Knowledge Engineering, 142, 102098.
Acknowledgement
The authors are grateful to the ASLAN project (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR). Data courtesy the ARTFL Encyclopédie Project, University of Chicago.
- Downloads last month
- 5