megantosh commited on
Commit
f80b98d
1 Parent(s): 971113c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -1,3 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
2
  Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using [Flair](https://aclanthology.org/C18-1139/) (forward+backward)and [fastText](https://fasttext.cc) embeddings.
3
 
@@ -9,7 +34,7 @@ This sequence labeling model was pretrained on three corpora jointly:
9
  A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
10
  2. [UD South Levantine Arabic MADAR](https://universaldependencies.org/treebanks/ajp_madar/index.html)
11
  A Dataset with 100 manually-annotated sentences taken from the [MADAR](https://camel.abudhabi.nyu.edu/madar/) (Multi-Arabic Dialect Applications and Resources) project by [Shorouq Zahra](mailto:shorouqjzahra@gmail.com).
12
- 3. Parts of the corpus developed for ["Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus"](https://aclanthology.org/L18-1601.pdf) by Hamed et al.
13
 
14
  # Usage
15
 
1
+ ---
2
+ language:
3
+ - ar
4
+ - en
5
+ license: apache-2.0
6
+ datasets:
7
+ - 4Dialects
8
+ - MADAR
9
+ - CSCS
10
+ thumbnail: https://www.informatik.hu-berlin.de/en/forschung-en/gebiete/ml-en/resolveuid/a6f82e0d7fa446a59c902cac4cafa9cb/@@images/image/preview
11
+ tags:
12
+ - flair
13
+ - PoS-Tagging
14
+ - sequence labeling
15
+ - Token Classification
16
+ - Dialectal Arabic
17
+ - Code-Switching
18
+ - Code-mixing
19
+ metrics:
20
+ - f1
21
+ widget:
22
+ - text: "لائحة «الوطنية للصحافة».. خطوة جديدة في طريق «الحصار»"
23
+ ---
24
+
25
+
26
  # Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
27
  Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using [Flair](https://aclanthology.org/C18-1139/) (forward+backward)and [fastText](https://fasttext.cc) embeddings.
28
 
34
  A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
35
  2. [UD South Levantine Arabic MADAR](https://universaldependencies.org/treebanks/ajp_madar/index.html)
36
  A Dataset with 100 manually-annotated sentences taken from the [MADAR](https://camel.abudhabi.nyu.edu/madar/) (Multi-Arabic Dialect Applications and Resources) project by [Shorouq Zahra](mailto:shorouqjzahra@gmail.com).
37
+ 3. Parts of the Cairo Students Code-Switch (CSCS) corpus developed for ["Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus"](https://aclanthology.org/L18-1601.pdf) by Hamed et al.
38
 
39
  # Usage
40