Ahmed Abdelali commited on
Commit
78f607a
1 Parent(s): 32bf566

Update config/readme

Browse files
Files changed (2) hide show
  1. README.md +22 -9
  2. config.json +3 -0
README.md CHANGED
@@ -1,15 +1,18 @@
1
  ---
2
  language: ar
3
  tags:
 
4
  - tf
5
  - qarib
6
-
7
- license: apache-2.0
8
  datasets:
9
- - Arabic GigaWord
10
- - Abulkhair Arabic Corpus
11
- - opus
12
- - Twitter data
 
 
 
13
  ---
14
 
15
  # QARiB: QCRI Arabic and Dialectal BERT
@@ -27,11 +30,11 @@ For Tweets, the data was collected using twitter API and using language filter.
27
  ## Training QARiB
28
  The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
29
  We used a Google Cloud Storage bucket, for persistent storage of training data and models.
30
- See more details in [Training QARiB](../Training_QARiB.md)
31
 
32
  ## Using QARiB
33
 
34
- You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](../Using_QARiB.md)
35
 
36
  ### How to use
37
  You can use this model directly with a pipeline for masked language modeling:
@@ -88,10 +91,20 @@ The results obtained from QARiB models outperforms multilingual BERT/AraBERT/Ara
88
 
89
 
90
  ## Model Weights and Vocab Download
91
- TBD
 
92
 
93
  ## Contacts
94
 
95
  Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish and Younes Samih
96
 
 
 
 
 
 
 
 
 
 
97
 
 
1
  ---
2
  language: ar
3
  tags:
4
+ - pytorch
5
  - tf
6
  - qarib
7
+ - qarib60_1790k
 
8
  datasets:
9
+ - arabic_billion_words
10
+ - open_subtitles
11
+ - twitter
12
+ metrics:
13
+ - f1
14
+ widget:
15
+ - text: " شو عندكم يا [MASK] ."
16
  ---
17
 
18
  # QARiB: QCRI Arabic and Dialectal BERT
 
30
  ## Training QARiB
31
  The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
32
  We used a Google Cloud Storage bucket, for persistent storage of training data and models.
33
+ See more details in [Training QARiB](https://github.com/qcri/QARIB/Training_QARiB.md)
34
 
35
  ## Using QARiB
36
 
37
+ You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](https://github.com/qcri/QARIB/Using_QARiB.md)
38
 
39
  ### How to use
40
  You can use this model directly with a pipeline for masked language modeling:
 
91
 
92
 
93
  ## Model Weights and Vocab Download
94
+
95
+ From Huggingface site: https://huggingface.co/qarib/qarib/bert-base-qarib60_1970k
96
 
97
  ## Contacts
98
 
99
  Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish and Younes Samih
100
 
101
+ ## Reference
102
+ ```
103
+ @article{abdelali2020qarib,
104
+ title={QARiB: QCRI Arabic and Dialectal BERT},
105
+ author={Ahmed, Abdelali and Sabit, Hassan and Hamdy, Mubarak and Kareem, Darwish and Younes, Samih},
106
+ link={https://github.com/qcri/QARIB},
107
+ year={2020}
108
+ }
109
+ ```
110
 
config.json CHANGED
@@ -1,4 +1,7 @@
1
  {
 
 
 
2
  "model_type": "bert",
3
  "attention_probs_dropout_prob": 0.1,
4
  "directionality": "bidi",
 
1
  {
2
+ "architectures": [
3
+ "BertForMaskedLM"
4
+ ],
5
  "model_type": "bert",
6
  "attention_probs_dropout_prob": 0.1,
7
  "directionality": "bidi",