Edit model card

bert-base-multilingual-cased-edda-domain-classification

This model is designed to classify encyclopedia articles into knowledge domains (e.g., History, Geography, Medicine, ...). It is a fine-tuned version of the bert-base-multilingual-cased model. It has been trained on the French Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772) edited by Diderot and d'Alembert (provided by the ARTFL Encyclopédie Project).

Model Description

Class labels

%TODO

Bias, Risks, and Limitations

This model was trained entirely on French encyclopaedic entries and will likely not perform well on text in other languages or other corpora.

Cite this work

Brenon, A., Moncla, L., & McDonough, K. (2022). Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions. Data & Knowledge Engineering, 142, 102098.

Acknowledgement

The authors are grateful to the ASLAN project (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR). Data courtesy the ARTFL Encyclopédie Project, University of Chicago.

Downloads last month
20
Safetensors
Model size
178M params
Tensor type
F32
·