megantosh
/

flair-arabic-dialects-codeswitch-egy-lev

Token Classification

sequence-tagger-model

Dialectal Arabic

Model card Files Files and versions Community

megantosh commited on Jul 7, 2021

Commit

f80b98d

•

1 Parent(s): 971113c

Update README.md

Files changed (1) hide show

README.md +26 -1

README.md CHANGED Viewed

@@ -1,3 +1,28 @@
 # Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
 Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using [Flair](https://aclanthology.org/C18-1139/) (forward+backward)and [fastText](https://fasttext.cc) embeddings.
@@ -9,7 +34,7 @@ This sequence labeling model was pretrained on three corpora jointly:
 A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
 2. [UD South Levantine Arabic MADAR](https://universaldependencies.org/treebanks/ajp_madar/index.html)
 A Dataset with 100 manually-annotated sentences taken from the [MADAR](https://camel.abudhabi.nyu.edu/madar/) (Multi-Arabic Dialect Applications and Resources) project by [Shorouq Zahra](mailto:shorouqjzahra@gmail.com).
-3. Parts of the corpus developed for ["Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus"](https://aclanthology.org/L18-1601.pdf) by Hamed et al.
 # Usage

+---
+language:
+- ar
+- en
+license: apache-2.0
+datasets:
+- 4Dialects
+- MADAR
+- CSCS
+thumbnail: https://www.informatik.hu-berlin.de/en/forschung-en/gebiete/ml-en/resolveuid/a6f82e0d7fa446a59c902cac4cafa9cb/@@images/image/preview
+tags:
+- flair
+- PoS-Tagging
+- sequence labeling
+- Token Classification
+- Dialectal Arabic
+- Code-Switching
+- Code-mixing
+metrics:
+- f1
+widget:
+- text: "لائحة «الوطنية للصحافة».. خطوة جديدة في طريق «الحصار»"
+---
 # Arabic Flair + fastText Part-of-Speech tagging Model (Egyptian and Levant)
 Pretrained Part-of-Speech tagging model built on a joint corpus written in Egyptian and Levantine (Jordanian, Lebanese, Palestinian, Syrian) dialects with code-switching of Egyptian Arabic and English. The model is trained using [Flair](https://aclanthology.org/C18-1139/) (forward+backward)and [fastText](https://fasttext.cc) embeddings.
 A Dialectal Arabic Datasets containing four dialects of Arabic, Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR). Each dataset consists of a set of 350 manually segmented and PoS tagged tweets.
 2. [UD South Levantine Arabic MADAR](https://universaldependencies.org/treebanks/ajp_madar/index.html)
 A Dataset with 100 manually-annotated sentences taken from the [MADAR](https://camel.abudhabi.nyu.edu/madar/) (Multi-Arabic Dialect Applications and Resources) project by [Shorouq Zahra](mailto:shorouqjzahra@gmail.com).
+3. Parts of the Cairo Students Code-Switch (CSCS) corpus developed for ["Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus"](https://aclanthology.org/L18-1601.pdf) by Hamed et al.
 # Usage