moha commited on
Commit
3785882
1 Parent(s): f11694f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ar
3
+ widget:
4
+ - text: "للوقايه من عدم انتشار [MASK]"
5
+ ---
6
+ # arabert_c19: An Arabert model pretrained on 1.5 million COVID-19 multi-dialect Arabic tweets
7
+ **mBERT COVID-19** is a pretrained (fine-tuned) version of the mBERT model (https://huggingface.co/bert-base-multilingual-cased). The pretraining was done using 1.5 million multi-dialect Arabic tweets regarding the COVID-19 pandemic from the “Large Arabic Twitter Dataset on COVID-19” (https://arxiv.org/abs/2004.04315).
8
+ The model can achieve better results for the tasks that deal with multi-dialect Arabic tweets in relation to the COVID-19 pandemic.
9
+ # Preprocessing
10
+ ```python
11
+ from arabert.preprocess import ArabertPreprocessor
12
+ model_name="moha/mbert_ar_c19"
13
+ arabert_prep = ArabertPreprocessor(model_name=model_name)
14
+ text = "للوقايه من عدم انتشار كورونا عليك اولا غسل اليدين بالماء والصابون وتكون عملية الغسل دقيقه تشمل راحة اليد الأصابع التركيز على الإبهام"
15
+ arabert_prep.preprocess(text)
16
+ ```
17
+ # Contacts
18
+ **Hadj Ameur**: [Github](https://github.com/MohamedHadjAmeur) | <mohamedhadjameur@gmail.com> | <mhadjameur@cerist.dz>