init Model card (README)
Browse files
README.md
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
|
2 |
+
Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using [Flair](https://aclanthology.org/C18-1139/) (forward+backward)and [fastText](https://fasttext.cc) embeddings.
|
3 |
+
|
4 |
+
|
5 |
+
|
6 |
+
# Pretraining Corpora:
|
7 |
+
This sequence labeling model was pretrained on three corpora jointly:
|
8 |
+
1. [4 Dialects](https://huggingface.co/datasets/viewer/?dataset=arabic_pos_dialect)
|
9 |
+
A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
|
10 |
+
2. [UD South Levantine Arabic MADAR](https://universaldependencies.org/treebanks/ajp_madar/index.html)
|
11 |
+
A Dataset with 100 manually-annotated sentences taken from the [MADAR](https://camel.abudhabi.nyu.edu/madar/) (Multi-Arabic Dialect Applications and Resources) project by [Shorouq Zahra](mailto:shorouqjzahra@gmail.com).
|
12 |
+
3. Parts of the corpus developed for ["Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus"](https://aclanthology.org/L18-1601.pdf) by Hamed et al.
|
13 |
+
|
14 |
+
# Usage
|
15 |
+
|
16 |
+
# Example
|
17 |
+
|
18 |
+
# Citation
|
19 |
+
*if you use this model in your work, please consider citing this work:*
|
20 |
+
```latex
|
21 |
+
@unpublished{MMHU21
|
22 |
+
author = "M. Megahed and A. Akbik",
|
23 |
+
title = "Sequence Labeling Architectures in Diglossia",
|
24 |
+
note = "In preparation",
|
25 |
+
}
|
26 |
+
```
|