Edit model card

calamanCy: Tagalog NLP pipelines in spaCy

Feature Description
Name tl_calamancy_md
Version 0.1.0
spaCy >=3.5.0
Default Pipeline tok2vec, tagger, morphologizer, parser, ner
Components tok2vec, tagger, morphologizer, parser, ner
Vectors -1 keys, 50000 unique vectors (200 dimensions)
Sources TLUnified dataset (Jan Christian Blaise Cruz and Charibeth Cheng)
UD_Tagalog-TRG (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)
UD_Tagalog-Ugnayan (Angelina Aquino)
License MIT
Author Lester James V. Miranda

Label Scheme

View label scheme (120 labels for 4 components)
Component Labels
tagger ADJ, ADJ_PART, ADP, ADV, ADV_PART, AUX, CCONJ, DET, DET_ADP, DET_PART, INTJ, NOUN, NOUN_PART, NUM, NUM_PART, PART, PRON, PRON_PART, PROPN, PUNCT, SCONJ, VERB, VERB_PART
morphologizer Aspect=Perf|Mood=Ind|POS=VERB|Voice=Act, Case=Nom|POS=ADP, POS=NOUN, POS=PUNCT, Aspect=Perf|Mood=Ind|POS=VERB|Voice=Pass, Case=Gen|POS=ADP, Case=Gen|Number=Sing|POS=PRON|Person=1|PronType=Prs, Aspect=Imp|Mood=Ind|POS=VERB|Voice=Act, POS=ADV|PronType=Dem, Foreign=Yes|POS=NOUN, Degree=Pos|POS=ADJ, Case=Nom|Number=Sing|POS=PRON|Person=3|PronType=Prs, Case=Nom|Deixis=Med|Number=Sing|POS=PRON|PronType=Dem, Gender=Masc|POS=PROPN, Case=Gen|Number=Sing|POS=PRON|Person=3|PronType=Prs, Degree=Pos|Link=Yes|POS=ADJ, POS=ADP, Case=Dat|POS=ADP, POS=VERB|Polarity=Pos, Aspect=Hab|POS=VERB, POS=SCONJ, Case=Nom|Number=Sing|POS=PRON|Person=1|PronType=Prs, Aspect=Prosp|Mood=Ind|POS=VERB|Voice=Act, POS=ADV, POS=PART|Polarity=Neg, Aspect=Imp|Mood=Ind|POS=VERB|Voice=Pass, Aspect=Perf|Mood=Ind|POS=VERB|Voice=Lfoc, POS=PROPN, Case=Nom|Deixis=Prox|Number=Sing|POS=PRON|PronType=Dem, Gender=Masc|POS=NOUN, Gender=Fem|POS=NOUN, Degree=Pos|Gender=Fem|POS=ADJ, Gender=Fem|POS=PROPN, Case=Nom|Clusivity=In|Number=Dual|POS=PRON|Person=1|PronType=Prs, Number=Plur|POS=DET|PronType=Ind, Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs, POS=PRON|PronType=Prs|Reflex=Yes, Gender=Masc|POS=DET|PronType=Emp, Case=Nom|POS=PRON|PronType=Int, Link=Yes|POS=NOUN, POS=PART|PartType=Int, POS=INTJ|Polarity=Pos, Link=Yes|POS=PART|PartType=Int, POS=VERB|Polarity=Neg, Degree=Pos|POS=ADJ|PronType=Int, Case=Gen|Number=Plur|POS=PRON|Person=3|PronType=Prs, Aspect=Perf|Mood=Ind|POS=VERB|PronType=Int|Voice=Act, Case=Nom|Number=Sing|POS=PRON|Person=2|PronType=Prs, Aspect=Perf|Mood=Ind|POS=VERB|PronType=Int|Voice=Pass, Aspect=Perf|Mood=Ind|POS=VERB|Voice=Ifoc, POS=ADV|PronType=Int, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Act, POS=PART|PartType=Nfh, Deixis=Remt|POS=ADV|PronType=Dem, Aspect=Imp|Mood=Pot|POS=VERB|Voice=Act, Link=Yes|POS=VERB|Polarity=Pos, Link=Yes|POS=VERB|Polarity=Neg, POS=PART|PartType=Des, Mood=Imp|POS=AUX|Polarity=Neg, Case=Nom|Link=Yes|Number=Plur|POS=PRON|Person=2|PronType=Prs, Case=Nom|Link=Yes|Number=Sing|POS=PRON|Person=3|PronType=Prs, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Pass, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Lfoc, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Bfoc, POS=DET|PronType=Tot, Case=Dat|Link=Yes|Number=Sing|POS=PRON|Person=3|PronType=Prs, Link=Yes|POS=PRON|PronType=Prs|Reflex=Yes, Mood=Imp|POS=VERB|Voice=Act, Case=Dat|Number=Sing|POS=PRON|Person=3|PronType=Prs, Mood=Imp|POS=VERB|Voice=Lfoc, Case=Gen|Number=Sing|POS=PRON|Person=2|PronType=Prs, Mood=Imp|POS=VERB|Voice=Pass, Case=Gen|Clusivity=In|Number=Plur|POS=PRON|Person=1|PronType=Prs, Aspect=Hab|POS=VERB|Voice=Pass, Gender=Masc|Link=Yes|POS=PROPN, Case=Gen|Link=Yes|Number=Sing|POS=PRON|Person=3|PronType=Prs, Case=Gen|Link=Yes|Number=Sing|POS=PRON|Person=1|PronType=Prs, POS=ADJ, POS=PART, POS=PRON, POS=VERB, POS=INTJ, POS=CCONJ, POS=NUM, POS=DET
parser ROOT, advmod, case, dep, nmod, nsubj, obj, obl, punct
ner LOC, ORG, PER

Citation

@inproceedings{miranda-2023-calamancy,
    title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit",
    author = "Miranda, Lester James",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Empirical Methods in Natural Language Processing",
    url = "https://aclanthology.org/2023.nlposs-1.1",
    pages = "1--7",
}
Downloads last month
127

Dataset used to train ljvmiranda921/tl_calamancy_md

Collection including ljvmiranda921/tl_calamancy_md

Evaluation results