metadata
language:
- ar
- en
license: apache-2.0
datasets:
- 4Dialects
- MADAR
- CSCS
thumbnail: >-
https://www.informatik.hu-berlin.de/en/forschung-en/gebiete/ml-en/resolveuid/a6f82e0d7fa446a59c902cac4cafa9cb/@@images/image/preview
tags:
- flair
- token-classification
- sequence-tagger-model
- Dialectal Arabic
- Code-Switching
- Code-Mixing
metrics:
- f1
widget:
- text: طلعوا جماعة الممانعة بالسياسة ما بيعرفوا ولا بالصحة بيعرفوا ولا حتى بالدين
- text: أعلم أن هذا يبدو غير عادل ، لكن لا يمكن أن يكون هناك ظلم
- text: أنا عارف أن الموضوع ده شكله مش عادل ، بس لا يمكن أن يكون فيه ظلم
Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using Flair (forward+backward)and fastText embeddings.
Pretraining Corpora:
This sequence labeling model was pretrained on three corpora jointly:
- 4 Dialects A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
- UD South Levantine Arabic MADAR A Dataset with 100 manually-annotated sentences taken from the MADAR (Multi-Arabic Dialect Applications and Resources) project by Shorouq Zahra.
- Parts of the Cairo Students Code-Switch (CSCS) corpus developed for "Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus" by Hamed et al.
Usage
from flair.data import Sentence
from flair.models import SequenceTagger
tagger = SequenceTagger.load("megantosh/flair-arabic-dialects-codeswitch-egy-lev")
sentence = Sentence('عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية بالقاهرة .')
tagger.predict(sentence)
for entity in sentence.get_spans('pos'):
print(entity)
Scores & Tagset
- F-score (micro): 0.8974
- F-score (macro): 0.5188
- Accuracy (incl. no class): 0.901 Expand details below to show class scores for each tag. Note that tag compounds (a tag made for multiple agglutinated parts of speech) are considered as separate ones.
precision | recall | f1-score | support | |
---|---|---|---|---|
INTJ | 0.8182 | 0.9000 | 0.8571 | 10 |
OUN | 0.9009 | 0.9402 | 0.9201 | 435 |
NUM | 0.9524 | 0.8333 | 0.8889 | 24 |
ADJ | 0.8762 | 0.7603 | 0.8142 | 121 |
ADP | 0.9903 | 0.9623 | 0.9761 | 106 |
CCONJ | 0.9600 | 0.9730 | 0.9664 | 74 |
PROPN | 0.9333 | 0.9333 | 0.9333 | 15 |
ADV | 0.9135 | 0.8051 | 0.8559 | 118 |
VERB | 0.8852 | 0.9231 | 0.9038 | 117 |
PRON | 0.9620 | 0.9465 | 0.9542 | 187 |
SCONJ | 0.8571 | 0.9474 | 0.9000 | 19 |
PART | 0.9350 | 0.9791 | 0.9565 | 191 |
DET | 0.9348 | 0.9149 | 0.9247 | 47 |
PUNCT | 1.0000 | 1.0000 | 1.0000 | 35 |
AUX | 0.9286 | 0.9811 | 0.9541 | 53 |
MENTION | 0.9231 | 1.0000 | 0.9600 | 12 |
V | 0.8571 | 0.8780 | 0.8675 | 82 |
FUT-PART+V+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
PROG-PART+V+PRON+PREP+PRON | 0.0000 | 1.0000 | 0.0000 | 0 |
ADJ+NSUFF | 0.6111 | 0.8462 | 0.7097 | 26 |
NOUN+NSUFF | 0.8182 | 0.8438 | 0.8308 | 64 |
PREP+PRON | 0.9565 | 0.9565 | 0.9565 | 23 |
PUNC | 0.9941 | 1.0000 | 0.9971 | 169 |
EOS | 1.0000 | 1.0000 | 1.0000 | 70 |
NOUN+PRON | 0.6986 | 0.8500 | 0.7669 | 60 |
V+PRON | 0.7258 | 0.8036 | 0.7627 | 56 |
PART+PRON | 1.0000 | 0.9474 | 0.9730 | 19 |
PROG-PART+V | 0.8333 | 0.9302 | 0.8791 | 43 |
DET+NOUN | 0.9625 | 1.0000 | 0.9809 | 77 |
NOUN+NSUFF+PRON | 0.9091 | 0.7143 | 0.8000 | 14 |
PROG-PART+V+PRON | 0.7083 | 0.9444 | 0.8095 | 18 |
PREP+NOUN+NSUFF | 0.6667 | 0.4000 | 0.5000 5 | |
NOUN+NSUFF+NSUFF | 1.0000 | 0.0000 | 0.0000 | 3 |
CONJ | 0.9722 | 1.0000 | 0.9859 | 35 |
V+PRON+PRON | 0.6364 | 0.5833 | 0.6087 | 12 |
FOREIGN | 0.6667 | 0.6667 | 0.6667 | 3 |
PREP+NOUN | 0.6316 | 0.7500 | 0.6857 | 16 |
DET+NOUN+NSUFF | 0.9000 | 0.9310 | 0.9153 | 29 |
DET+ADJ+NSUFF | 1.0000 | 0.5714 | 0.7273 | 7 |
CONJ+PRON | 1.0000 | 0.8750 | 0.9333 | 8 |
NOUN+CASE | 0.0000 | 0.0000 | 0.0000 | 2 |
DET+ADJ | 1.0000 | 0.6667 | 0.8000 | 6 |
PREP | 1.0000 | 0.9718 | 0.9857 | 71 |
CONJ+FUT-PART+V | 0.0000 | 0.0000 | 0.0000 | 1 |
CONJ+V | 0.6667 | 0.7500 | 0.7059 | 8 |
FUT-PART | 1.0000 | 1.0000 | 1.0000 | 2 |
ADJ+PRON | 1.0000 | 0.0000 | 0.0000 | 8 |
CONJ+PREP+NOUN+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+NOUN+PRON | 0.3750 | 1.0000 | 0.5455 | 3 |
PART+ADJ | 1.0000 | 0.0000 | 0.0000 | 1 |
PART+NOUN | 0.5000 | 1.0000 | 0.6667 | 1 |
CONJ+PREP+NOUN | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+NOUN | 0.7000 | 0.7778 | 0.7368 | 9 |
URL | 1.0000 | 1.0000 | 1.0000 | 3 |
CONJ+FUT-PART | 1.0000 | 0.0000 | 0.0000 | 1 |
FUT-PART+V | 0.8571 | 0.6000 | 0.7059 | 10 |
PREP+NOUN+NSUFF+NSUFF | 1.0000 | 0.0000 | 0.0000 | 1 |
HASH | 1.0000 | 0.9412 | 0.9697 | 17 |
ADJ+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 3 |
PREP+NOUN+PRON | 0.0000 | 0.0000 | 0.0000 | 1 |
EMOT | 1.0000 | 0.8889 | 0.9412 | 18 |
CONJ+PREP | 1.0000 | 0.7500 | 0.8571 | 4 |
PREP+DET+NOUN+NSUFF | 1.0000 | 0.7500 | 0.8571 | 4 |
PRON+DET+NOUN+NSUFF | 0.0000 | 1.0000 | 0.0000 | 0 |
V+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 5 |
V+PRON+PREP+PRON | 0.0000 | 1.0000 | 0.0000 | 0 |
CONJ+NOUN+NSUFF | 0.5000 | 0.5000 | 0.5000 | 2 |
V+NEG-PART | 1.0000 | 0.0000 | 0.0000 | 2 |
PREP+DET+NOUN | 0.9091 | 1.0000 | 0.9524 | 10 |
PREP+V | 1.0000 | 0.0000 | 0.0000 | 2 |
CONJ+PART | 1.0000 | 0.7778 | 0.8750 | 9 |
CONJ+V+PRON | 1.0000 | 1.0000 | 1.0000 | 5 |
PROG-PART+V+PREP+PRON | 1.0000 | 0.5000 | 0.6667 | 2 |
PREP+NOUN+NSUFF+PRON | 1.0000 | 1.0000 | 1.0000 | 1 |
ADJ+CASE | 1.0000 | 0.0000 | 0.0000 | 1 |
PART+NOUN+PRON | 1.0000 | 1.0000 | 1.0000 | 1 |
PART+V | 1.0000 | 0.0000 | 0.0000 | 3 |
PART+V+PRON | 0.0000 | 1.0000 | 0.0000 | 0 |
FUT-PART+V+PRON | 0.0000 | 1.0000 | 0.0000 | 0 |
FUT-PART+V+PRON+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+V+PRON+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+V+PREP+PRON | 0.0000 | 1.0000 | 0.0000 | 0 |
CONJ+DET+NOUN+NSUFF | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+DET+NOUN | 0.6667 | 1.0000 | 0.8000 | 2 |
CONJ+PREP+DET+NOUN | 1.0000 | 1.0000 | 1.0000 | 1 |
PREP+PART | 1.0000 | 0.0000 | 0.0000 | 2 |
PART+V+PRON+NEG-PART | 0.3333 | 0.3333 | 0.3333 | 3 |
PART+V+NEG-PART | 0.3333 | 0.5000 | 0.4000 | 2 |
PART+PREP+NEG-PART | 1.0000 | 1.0000 | 1.0000 | 3 |
PART+PROG-PART+V+NEG-PART | 1.0000 | 0.3333 | 0.5000 | 3 |
PREP+DET+NOUN+NSUFF+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
PREP+PRON+DET+NOUN | 0.0000 | 1.0000 | 0.0000 | 0 |
PART+NSUFF | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+PROG-PART+V+PRON | 1.0000 | 1.0000 | 1.0000 | 1 |
PART+PREP+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+PART+PREP | 1.0000 | 0.0000 | 0.0000 | 1 |
NUM+NSUFF | 0.6667 | 0.6667 | 0.6667 | 3 |
CONJ+PART+V+PRON+NEG-PART | 1.0000 | 1.0000 | 1.0000 | 1 |
PART+NOUN+NEG-PART | 1.0000 | 1.0000 | 1.0000 | 1 |
CONJ+ADJ+NSUFF | 1.0000 | 0.0000 | 0.0000 | 1 |
PREP+ADJ | 1.0000 | 0.0000 | 0.0000 | 1 |
ADJ+NSUFF+PRON | 1.0000 | 0.0000 | 0.0000 | 2 |
CONJ+PROG-PART+V | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+PART+PROG-PART+V+PREP+PRON+NEG-PART | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+PART+PREP+PRON+NEG-PART | 0.0000 | 1.0000 | 0.0000 | 0 |
PREP+PART+PRON | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+ADV+NSUFF | 1.0000 | 0.0000 | 0.0000 | 1 |
CONJ+ADV | 0.0000 | 1.0000 | 0.0000 | 0 |
PART+NOUN+PRON+NEG-PART | 0.0000 | 1.0000 | 0.0000 | 0 |
CONJ+ADJ | 1.0000 | 1.0000 | 1.0000 | 1 |
Citation
if you use this model in your work, please consider citing this work:
@unpublished{MMHU21
author = "M. Megahed",
title = "Sequence Labeling Architectures in Diglossia",
note = "In review",
}