calamanCy: Tagalog NLP pipelines in spaCy

Paper: arxiv.org/abs/2311.07171

Feature Description
Name tl_calamancy_trf
Version 0.1.0
spaCy >=3.5.0,<4.0.0
Default Pipeline transformer, tagger, morphologizer, parser, ner
Components transformer, tagger, morphologizer, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources TLUnified dataset (Jan Christian Blaise Cruz and Charibeth Cheng)
UD_Tagalog-TRG (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)
UD_Tagalog-Ugnayan (Angelina Aquino)
License MIT
Author Lester James V. Miranda

Label Scheme

View label scheme (120 labels for 4 components)
Component Labels
tagger ADJ, ADJ_PART, ADP, ADV, ADV_PART, AUX, CCONJ, DET, DET_ADP, DET_PART, INTJ, NOUN, NOUN_PART, NUM, NUM_PART, PART, PRON, PRON_PART, PROPN, PUNCT, SCONJ, VERB, VERB_PART
morphologizer Aspect=Perf|Mood=Ind|POS=VERB|Voice=Act, Case=Nom|POS=ADP, POS=NOUN, POS=PUNCT, Aspect=Perf|Mood=Ind|POS=VERB|Voice=Pass, Case=Gen|POS=ADP, Case=Gen|Number=Sing|POS=PRON|Person=1|PronType=Prs, Aspect=Imp|Mood=Ind|POS=VERB|Voice=Act, POS=ADV|PronType=Dem, Foreign=Yes|POS=NOUN, Degree=Pos|POS=ADJ, Case=Nom|Number=Sing|POS=PRON|Person=3|PronType=Prs, Case=Nom|Deixis=Med|Number=Sing|POS=PRON|PronType=Dem, Gender=Masc|POS=PROPN, Case=Gen|Number=Sing|POS=PRON|Person=3|PronType=Prs, Degree=Pos|Link=Yes|POS=ADJ, POS=ADP, Case=Dat|POS=ADP, POS=VERB|Polarity=Pos, Aspect=Hab|POS=VERB, POS=SCONJ, Case=Nom|Number=Sing|POS=PRON|Person=1|PronType=Prs, Aspect=Prosp|Mood=Ind|POS=VERB|Voice=Act, POS=ADV, POS=PART|Polarity=Neg, Aspect=Imp|Mood=Ind|POS=VERB|Voice=Pass, Aspect=Perf|Mood=Ind|POS=VERB|Voice=Lfoc, POS=PROPN, Case=Nom|Deixis=Prox|Number=Sing|POS=PRON|PronType=Dem, Gender=Masc|POS=NOUN, Gender=Fem|POS=NOUN, Degree=Pos|Gender=Fem|POS=ADJ, Gender=Fem|POS=PROPN, Case=Nom|Clusivity=In|Number=Dual|POS=PRON|Person=1|PronType=Prs, Number=Plur|POS=DET|PronType=Ind, Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs, POS=PRON|PronType=Prs|Reflex=Yes, Gender=Masc|POS=DET|PronType=Emp, Case=Nom|POS=PRON|PronType=Int, Link=Yes|POS=NOUN, POS=PART|PartType=Int, POS=INTJ|Polarity=Pos, Link=Yes|POS=PART|PartType=Int, POS=VERB|Polarity=Neg, Degree=Pos|POS=ADJ|PronType=Int, Case=Gen|Number=Plur|POS=PRON|Person=3|PronType=Prs, Aspect=Perf|Mood=Ind|POS=VERB|PronType=Int|Voice=Act, Case=Nom|Number=Sing|POS=PRON|Person=2|PronType=Prs, Aspect=Perf|Mood=Ind|POS=VERB|PronType=Int|Voice=Pass, Aspect=Perf|Mood=Ind|POS=VERB|Voice=Ifoc, POS=ADV|PronType=Int, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Act, POS=PART|PartType=Nfh, Deixis=Remt|POS=ADV|PronType=Dem, Aspect=Imp|Mood=Pot|POS=VERB|Voice=Act, Link=Yes|POS=VERB|Polarity=Pos, Link=Yes|POS=VERB|Polarity=Neg, POS=PART|PartType=Des, Mood=Imp|POS=AUX|Polarity=Neg, Case=Nom|Link=Yes|Number=Plur|POS=PRON|Person=2|PronType=Prs, Case=Nom|Link=Yes|Number=Sing|POS=PRON|Person=3|PronType=Prs, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Pass, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Lfoc, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Bfoc, POS=DET|PronType=Tot, Case=Dat|Link=Yes|Number=Sing|POS=PRON|Person=3|PronType=Prs, Link=Yes|POS=PRON|PronType=Prs|Reflex=Yes, Mood=Imp|POS=VERB|Voice=Act, Case=Dat|Number=Sing|POS=PRON|Person=3|PronType=Prs, Mood=Imp|POS=VERB|Voice=Lfoc, Case=Gen|Number=Sing|POS=PRON|Person=2|PronType=Prs, Mood=Imp|POS=VERB|Voice=Pass, Case=Gen|Clusivity=In|Number=Plur|POS=PRON|Person=1|PronType=Prs, Aspect=Hab|POS=VERB|Voice=Pass, Gender=Masc|Link=Yes|POS=PROPN, Case=Gen|Link=Yes|Number=Sing|POS=PRON|Person=3|PronType=Prs, Case=Gen|Link=Yes|Number=Sing|POS=PRON|Person=1|PronType=Prs, POS=ADJ, POS=PART, POS=PRON, POS=VERB, POS=INTJ, POS=CCONJ, POS=NUM, POS=DET
parser ROOT, advmod, case, dep, nmod, nsubj, obj, obl, punct
ner LOC, ORG, PER

Citation

@inproceedings{miranda-2023-calamancy,
    title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit",
    author = "Miranda, Lester James",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Empirical Methods in Natural Language Processing",
    url = "https://aclanthology.org/2023.nlposs-1.1",
    pages = "1--7",
}
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ljvmiranda921/tl_calamancy_trf

Collection including ljvmiranda921/tl_calamancy_trf

Evaluation results