megantosh commited on
Commit
971113c
1 Parent(s): 439b339

init Model card (README)

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
2
+ Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using [Flair](https://aclanthology.org/C18-1139/) (forward+backward)and [fastText](https://fasttext.cc) embeddings.
3
+
4
+
5
+
6
+ # Pretraining Corpora:
7
+ This sequence labeling model was pretrained on three corpora jointly:
8
+ 1. [4 Dialects](https://huggingface.co/datasets/viewer/?dataset=arabic_pos_dialect)
9
+ A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
10
+ 2. [UD South Levantine Arabic MADAR](https://universaldependencies.org/treebanks/ajp_madar/index.html)
11
+ A Dataset with 100 manually-annotated sentences taken from the [MADAR](https://camel.abudhabi.nyu.edu/madar/) (Multi-Arabic Dialect Applications and Resources) project by [Shorouq Zahra](mailto:shorouqjzahra@gmail.com).
12
+ 3. Parts of the corpus developed for ["Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus"](https://aclanthology.org/L18-1601.pdf) by Hamed et al.
13
+
14
+ # Usage
15
+
16
+ # Example
17
+
18
+ # Citation
19
+ *if you use this model in your work, please consider citing this work:*
20
+ ```latex
21
+ @unpublished{MMHU21
22
+ author = "M. Megahed and A. Akbik",
23
+ title = "Sequence Labeling Architectures in Diglossia",
24
+ note = "In preparation",
25
+ }
26
+ ```