AI & ML interests

Latin, natural language processing

LatinCy

Synthetic trained spaCy pipelines for Latin NLP

Developed by Patrick J. Burns, 2023.

Paper

Details about training, datasets, etc. can be found in the following paper: Burns, P.J. 2023. “LatinCy: Synthetic Trained Pipelines for Latin NLP.” https://arxiv.org/abs/2305.04365v1.

Citation

@misc{burns_latincy_2023,
    title = {{LatinCy}: Synthetic Trained Pipelines for Latin {NLP}},
    author = {Burns, Patrick J.},
    url = {https://arxiv.org/abs/2305.04365v1},
    shorttitle = {{LatinCy}},
    abstract = {This paper introduces {LatinCy}, a set of trained general purpose Latin-language "core" pipelines for use with the {spaCy} natural language processing framework. The models are trained on a large amount of available Latin data, including all five of the Latin Universal Dependency treebanks, which have been preprocessed to be compatible with each other. The result is a set of general models for Latin with good performance on a number of natural language processing tasks (e.g. the top-performing model yields {POS} tagging, 97.41\% accuracy; lemmatization, 94.66\% accuracy; morphological tagging 92.76\% accuracy). The paper describes the model training, including its training data and parameterization, and presents the advantages to Latin-language researchers of having a {spaCy} model available for {NLP} work.},
    date = {2023-05-07},
    langid = {english},
}

datasets

None public yet