qarib
/

bert-base-qarib60_860k

@@ -2,6 +2,9 @@
 language: ar
 tags:
 - qarib
 license: apache-2.0
 datasets:
@@ -9,13 +12,14 @@ datasets:
 - Abulkhair Arabic Corpus
 - opus
 - Twitter data
 ---
 # QARiB: QCRI Arabic and Dialectal BERT
 ## About QARiB
 QCRI Arabic and Dialectal BERT  (QARiB) model, was trained on a collection of ~ 420 Million tweets and ~ 180 Million sentences of text.
-For Tweets, the data was collected using twitter API and using language filter. `lang:ar`. For Text data, it was a combination from
 [Arabic GigaWord](url), [Abulkhair Arabic Corpus]() and [OPUS](http://opus.nlpl.eu/).
 ### bert-base-qarib60_860k
@@ -26,11 +30,11 @@ For Tweets, the data was collected using twitter API and using language filter.
 ## Training QARiB
 The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
 We used a Google Cloud Storage bucket, for persistent storage of training data and models.
-See more details in [Training QARiB](../Training_QARiB.md)
 ## Using QARiB
-You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](../Using_QARiB.md)
 ### How to use
 You can use this model directly with a pipeline for masked language modeling:

 language: ar
 tags:
 - qarib
+- tf
+- pytorch
 license: apache-2.0
 datasets:
 - Abulkhair Arabic Corpus
 - opus
 - Twitter data
 ---
 # QARiB: QCRI Arabic and Dialectal BERT
 ## About QARiB
 QCRI Arabic and Dialectal BERT  (QARiB) model, was trained on a collection of ~ 420 Million tweets and ~ 180 Million sentences of text.
+For tweets, the data was collected using twitter API and using language filter. `lang:ar`. For text data, it was a combination from
 [Arabic GigaWord](url), [Abulkhair Arabic Corpus]() and [OPUS](http://opus.nlpl.eu/).
 ### bert-base-qarib60_860k
 ## Training QARiB
 The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
 We used a Google Cloud Storage bucket, for persistent storage of training data and models.
+See more details in [Training QARiB](https://github.com/qcri/QARiB/blob/main/Training_QARiB.md)
 ## Using QARiB
+You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](https://github.com/qcri/QARiB/blob/main/Using_QARiB.md)
 ### How to use
 You can use this model directly with a pipeline for masked language modeling: