Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
|
2 |
Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using [Flair](https://aclanthology.org/C18-1139/) (forward+backward)and [fastText](https://fasttext.cc) embeddings.
|
3 |
|
@@ -9,7 +34,7 @@ This sequence labeling model was pretrained on three corpora jointly:
|
|
9 |
A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
|
10 |
2. [UD South Levantine Arabic MADAR](https://universaldependencies.org/treebanks/ajp_madar/index.html)
|
11 |
A Dataset with 100 manually-annotated sentences taken from the [MADAR](https://camel.abudhabi.nyu.edu/madar/) (Multi-Arabic Dialect Applications and Resources) project by [Shorouq Zahra](mailto:shorouqjzahra@gmail.com).
|
12 |
-
3. Parts of the corpus developed for ["Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus"](https://aclanthology.org/L18-1601.pdf) by Hamed et al.
|
13 |
|
14 |
# Usage
|
15 |
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- ar
|
4 |
+
- en
|
5 |
+
license: apache-2.0
|
6 |
+
datasets:
|
7 |
+
- 4Dialects
|
8 |
+
- MADAR
|
9 |
+
- CSCS
|
10 |
+
thumbnail: https://www.informatik.hu-berlin.de/en/forschung-en/gebiete/ml-en/resolveuid/a6f82e0d7fa446a59c902cac4cafa9cb/@@images/image/preview
|
11 |
+
tags:
|
12 |
+
- flair
|
13 |
+
- PoS-Tagging
|
14 |
+
- sequence labeling
|
15 |
+
- Token Classification
|
16 |
+
- Dialectal Arabic
|
17 |
+
- Code-Switching
|
18 |
+
- Code-mixing
|
19 |
+
metrics:
|
20 |
+
- f1
|
21 |
+
widget:
|
22 |
+
- text: "لائحة «الوطنية للصحافة».. خطوة جديدة في طريق «الحصار»"
|
23 |
+
---
|
24 |
+
|
25 |
+
|
26 |
# Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
|
27 |
Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using [Flair](https://aclanthology.org/C18-1139/) (forward+backward)and [fastText](https://fasttext.cc) embeddings.
|
28 |
|
34 |
A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
|
35 |
2. [UD South Levantine Arabic MADAR](https://universaldependencies.org/treebanks/ajp_madar/index.html)
|
36 |
A Dataset with 100 manually-annotated sentences taken from the [MADAR](https://camel.abudhabi.nyu.edu/madar/) (Multi-Arabic Dialect Applications and Resources) project by [Shorouq Zahra](mailto:shorouqjzahra@gmail.com).
|
37 |
+
3. Parts of the Cairo Students Code-Switch (CSCS) corpus developed for ["Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus"](https://aclanthology.org/L18-1601.pdf) by Hamed et al.
|
38 |
|
39 |
# Usage
|
40 |
|