Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using Flair (forward+backward)and fastText embeddings.
Pretraining Corpora:
This sequence labeling model was pretrained on three corpora jointly:
- 4 Dialects A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
- UD South Levantine Arabic MADAR A Dataset with 100 manually-annotated sentences taken from the MADAR (Multi-Arabic Dialect Applications and Resources) project by Shorouq Zahra.
- Parts of the Cairo Students Code-Switch (CSCS) corpus developed for "Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus" by Hamed et al.
Usage
from flair.data import Sentence
from flair.models import SequenceTagger
tagger = SequenceTagger.load("megantosh/flair-arabic-dialects-codeswitch-egy-lev")
sentence = Sentence('عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية بالقاهرة .')
tagger.predict(sentence)
for entity in sentence.get_spans('pos'):
print(entity)
Due to the right-to-left in left-to-right context, some formatting errors might occur. and your code might appear like this, (link accessed on 2020-10-27)
Scores & Tagset
precision | recall | f1-score | support | |
---|---|---|---|---|
INTJ | 0.8182 | 0.9000 | 0.8571 | 10 |
OUN | 0.9009 | 0.9402 | 0.9201 | 435 |
NUM | 0.9524 | 0.8333 | 0.8889 | 24 |
ADJ | 0.8762 | 0.7603 | 0.8142 | 121 |
ADP | 0.9903 | 0.9623 | 0.9761 | 106 |
CCONJ | 0.9600 | 0.9730 | 0.9664 | 74 |
PROPN | 0.9333 | 0.9333 | 0.9333 | 15 |
ADV | 0.9135 | 0.8051 | 0.8559 | 118 |
VERB | 0.8852 | 0.9231 | 0.9038 | 117 |
PRON | 0.9620 | 0.9465 | 0.9542 | 187 |
SCONJ | 0.8571 | 0.9474 | 0.9000 | 19 |
PART | 0.9350 | 0.9791 | 0.9565 | 191 |
DET | 0.9348 | 0.9149 | 0.9247 | 47 |
PUNCT | 1.0000 | 1.0000 | 1.0000 | 35 |
AUX | 0.9286 | 0.9811 | 0.9541 | 53 |
MENTION | 0.9231 | 1.0000 | 0.9600 | 12 |
V | 0.8571 | 0.8780 | 0.8675 | 82 |
FUT-PART+V+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
PROG-PART+V+PRON+PREP+PRON | 0.0000 | 1.0000 | 0.0000 | 0 |
ADJ+NSUFF | 0.6111 | 0.8462 | 0.7097 | 26 |
NOUN+NSUFF | 0.8182 | 0.8438 | 0.8308 | 64 |
PREP+PRON | 0.9565 | 0.9565 | 0.9565 | 23 |
PUNC | 0.9941 | 1.0000 | 0.9971 | 169 |
EOS | 1.0000 | 1.0000 | 1.0000 | 70 |
NOUN+PRON | 0.6986 | 0.8500 | 0.7669 | 60 |
V+PRON | 0.7258 | 0.8036 | 0.7627 | 56 |
PART+PRON | 1.0000 | 0.9474 | 0.9730 | 19 |
PROG-PART+V | 0.8333 | 0.9302 | 0.8791 | 43 |
DET+NOUN | 0.9625 | 1.0000 | 0.9809 | 77 |
NOUN+NSUFF+PRON | 0.9091 | 0.7143 | 0.8000 | 14 |
PROG-PART+V+PRON | 0.7083 | 0.9444 | 0.8095 | 18 |
PREP+NOUN+NSUFF | 0.6667 | 0.4000 | 0.5000 5 | |
NOUN+NSUFF+NSUFF | 1.0000 | 0.0000 | 0.0000 | 3 |
CONJ | 0.9722 | 1.0000 | 0.9859 | 35 |
V+PRON+PRON | 0.6364 | 0.5833 | 0.6087 | 12 |
FOREIGN | 0.6667 | 0.6667 | 0.6667 | 3 |
PREP+NOUN | 0.6316 | 0.7500 | 0.6857 | 16 |
DET+NOUN+NSUFF | 0.9000 | 0.9310 | 0.9153 | 29 |
DET+ADJ+NSUFF | 1.0000 | 0.5714 | 0.7273 | 7 |
CONJ+PRON | 1.0000 | 0.8750 | 0.9333 | 8 |
NOUN+CASE | 0.0000 | 0.0000 | 0.0000 | 2 |
DET+ADJ | 1.0000 | 0.6667 | 0.8000 | 6 |
PREP | 1.0000 | 0.9718 | 0.9857 | 71 |
CONJ+FUT-PART+V | 0.0000 | 0.0000 | 0.0000 | 1 |
CONJ+V | 0.6667 | 0.7500 | 0.7059 | 8 |
FUT-PART | 1.0000 | 1.0000 | 1.0000 | 2 |
ADJ+PRON | 1.0000 | 0.0000 | 0.0000 | 8 |
CONJ+PREP+NOUN+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+NOUN+PRON | 0.3750 | 1.0000 | 0.5455 | 3 |
PART+ADJ | 1.0000 | 0.0000 | 0.0000 | 1 |
PART+NOUN | 0.5000 | 1.0000 | 0.6667 | 1 |
CONJ+PREP+NOUN | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+NOUN | 0.7000 | 0.7778 | 0.7368 | 9 |
URL | 1.0000 | 1.0000 | 1.0000 | 3 |
CONJ+FUT-PART | 1.0000 | 0.0000 | 0.0000 | 1 |
FUT-PART+V | 0.8571 | 0.6000 | 0.7059 | 10 |
PREP+NOUN+NSUFF+NSUFF | 1.0000 | 0.0000 | 0.0000 | 1 |
HASH | 1.0000 | 0.9412 | 0.9697 | 17 |
ADJ+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 3 |
PREP+NOUN+PRON | 0.0000 | 0.0000 | 0.0000 | 1 |
EMOT | 1.0000 | 0.8889 | 0.9412 | 18 |
CONJ+PREP | 1.0000 | 0.7500 | 0.8571 | 4 |
PREP+DET+NOUN+NSUFF | 1.0000 | 0.7500 | 0.8571 | 4 |
PRON+DET+NOUN+NSUFF | 0.0000 | 1.0000 | 0.0000 | 0 |
V+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 5 |
V+PRON+PREP+PRON | 0.0000 | 1.0000 | 0.0000 | 0 |
CONJ+NOUN+NSUFF | 0.5000 | 0.5000 | 0.5000 | 2 |
V+NEG-PART | 1.0000 | 0.0000 | 0.0000 | 2 |
PREP+DET+NOUN | 0.9091 | 1.0000 | 0.9524 | 10 |
PREP+V | 1.0000 | 0.0000 | 0.0000 | 2 |
CONJ+PART | 1.0000 | 0.7778 | 0.8750 | 9 |
CONJ+V+PRON | 1.0000 | 1.0000 | 1.0000 | 5 |
PROG-PART+V+PREP+PRON | 1.0000 | 0.5000 | 0.6667 | 2 |
PREP+NOUN+NSUFF+PRON | 1.0000 | 1.0000 | 1.0000 | 1 |
ADJ+CASE | 1.0000 | 0.0000 | 0.0000 | 1 |
PART+NOUN+PRON | 1.0000 | 1.0000 | 1.0000 | 1 |
PART+V | 1.0000 | 0.0000 | 0.0000 | 3 |
PART+V+PRON | 0.0000 | 1.0000 | 0.0000 | 0 |
FUT-PART+V+PRON | 0.0000 | 1.0000 | 0.0000 | 0 |
FUT-PART+V+PRON+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+V+PRON+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+V+PREP+PRON | 0.0000 | 1.0000 | 0.0000 | 0 |
CONJ+DET+NOUN+NSUFF | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+DET+NOUN | 0.6667 | 1.0000 | 0.8000 | 2 |
CONJ+PREP+DET+NOUN | 1.0000 | 1.0000 | 1.0000 | 1 |
PREP+PART | 1.0000 | 0.0000 | 0.0000 | 2 |
PART+V+PRON+NEG-PART | 0.3333 | 0.3333 | 0.3333 | 3 |
PART+V+NEG-PART | 0.3333 | 0.5000 | 0.4000 | 2 |
PART+PREP+NEG-PART | 1.0000 | 1.0000 | 1.0000 | 3 |
PART+PROG-PART+V+NEG-PART | 1.0000 | 0.3333 | 0.5000 | 3 |
PREP+DET+NOUN+NSUFF+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
PREP+PRON+DET+NOUN | 0.0000 | 1.0000 | 0.0000 | 0 |
PART+NSUFF | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+PROG-PART+V+PRON | 1.0000 | 1.0000 | 1.0000 | 1 |
PART+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+PART+PREP | 1.0000 | 0.0000 | 0.0000 | 1 |
NUM+NSUFF | 0.6667 | 0.6667 | 0.6667 | 3 |
CONJ+PART+V+PRON+NEG-PART | 1.0000 | 1.0000 | 1.0000 | 1 |
PART+NOUN+NEG-PART | 1.0000 | 1.0000 | 1.0000 | 1 |
CONJ+ADJ+NSUFF | 1.0000 | 0.0000 | 0.0000 | 1 |
PREP+ADJ | 1.0000 | 0.0000 | 0.0000 | 1 |
ADJ+NSUFF+PRON | 1.0000 | 0.0000 | 0.0000 | 2 |
CONJ+PROG-PART+V | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+PART+PROG-PART+V+PREP+PRON+NEG-PART | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+PART+PREP+PRON+NEG-PART | 0.0000 | 1.0000 | 0.0000 | 0 |
PREP+PART+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+ADV+NSUFF | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+ADV | 0.0000 | 1.0000 | 0.0000 | 0 |
PART+NOUN+PRON+NEG-PART | 0.0000 | 1.0000 | 0.0000 | 0 |
CONJ+ADJ | 1.0000 | 1.0000 | 1.0000 | 1 |
- F-score (micro): 0.8974
- F-score (macro): 0.5188
- Accuracy (incl. no class): 0.901
Expand details below to show class scores for each tag. Note that tag compounds (a tag made for multiple agglutinated parts of speech) are considered as separate ones.
Citation
if you use this model, please consider citing this work:
@unpublished{MMHU21
author = "M. Megahed",
title = "Sequence Labeling Architectures in Diglossia",
year = {2021},
doi = "10.13140/RG.2.2.34961.10084"
url = {https://www.researchgate.net/publication/358956953_Sequence_Labeling_Architectures_in_Diglossia_-_a_case_study_of_Arabic_and_its_dialects}
}
- Downloads last month
- 14
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.