megantosh
/

flair-arabic-dialects-codeswitch-egy-lev

Token Classification

sequence-tagger-model

Dialectal Arabic

Model card Files Files and versions Community

megantosh commited on Jul 7, 2021

Commit

971113c

•

1 Parent(s): 439b339

init Model card (README)

Files changed (1) hide show

README.md +26 -0

README.md ADDED Viewed

	@@ -0,0 +1,26 @@

+# Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
+Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using [Flair](https://aclanthology.org/C18-1139/) (forward+backward)and [fastText](https://fasttext.cc) embeddings.
+# Pretraining Corpora:
+This sequence labeling model was pretrained on three corpora jointly:
+1. [4 Dialects](https://huggingface.co/datasets/viewer/?dataset=arabic_pos_dialect)
+A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
+2. [UD South Levantine Arabic MADAR](https://universaldependencies.org/treebanks/ajp_madar/index.html)
+A Dataset with 100 manually-annotated sentences taken from the [MADAR](https://camel.abudhabi.nyu.edu/madar/) (Multi-Arabic Dialect Applications and Resources) project by [Shorouq Zahra](mailto:shorouqjzahra@gmail.com).
+3. Parts of the corpus developed for ["Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus"](https://aclanthology.org/L18-1601.pdf) by Hamed et al.
+# Usage
+# Example
+# Citation
+*if you use this model in your work, please consider citing this work:*
+```latex
+@unpublished{MMHU21
+author = "M. Megahed and A. Akbik",
+title = "Sequence Labeling Architectures in Diglossia",
+note = "In preparation",
+}
+```