Kamel commited on
Commit
e71dd13
1 Parent(s): 331e57f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -1,3 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  **DarijaBERT** is the first BERT model for the Moroccan Arabic dialect called “Darija”. It is based on the same architecture as BERT-base, but without the Next Sentence Prediction (NSP) objective. This model was trained on a total of ~3 Million sequences of Darija dialect representing 691MB of text or a total of ~100M tokens.
2
 
3
  The model was trained on a dataset issued from three different sources:
 
1
+ ---
2
+ language: ar
3
+ datasets:
4
+ - wikipedia
5
+ - OSIAN
6
+ - 1.5B Arabic Corpus
7
+ - OSCAR Arabic Unshuffled
8
+ widget:
9
+ - text: " جاب ليا [MASK] ."
10
+ ---
11
+
12
+
13
+
14
+
15
+
16
  **DarijaBERT** is the first BERT model for the Moroccan Arabic dialect called “Darija”. It is based on the same architecture as BERT-base, but without the Next Sentence Prediction (NSP) objective. This model was trained on a total of ~3 Million sequences of Darija dialect representing 691MB of text or a total of ~100M tokens.
17
 
18
  The model was trained on a dataset issued from three different sources: