qarib commited on
Commit
37a9531
1 Parent(s): 53cbdef

Update links for training and using the models.

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -2,6 +2,9 @@
2
  language: ar
3
  tags:
4
  - qarib
 
 
 
5
 
6
  license: apache-2.0
7
  datasets:
@@ -9,13 +12,14 @@ datasets:
9
  - Abulkhair Arabic Corpus
10
  - opus
11
  - Twitter data
 
12
  ---
13
 
14
  # QARiB: QCRI Arabic and Dialectal BERT
15
 
16
  ## About QARiB
17
  QCRI Arabic and Dialectal BERT (QARiB) model, was trained on a collection of ~ 420 Million tweets and ~ 180 Million sentences of text.
18
- For Tweets, the data was collected using twitter API and using language filter. `lang:ar`. For Text data, it was a combination from
19
  [Arabic GigaWord](url), [Abulkhair Arabic Corpus]() and [OPUS](http://opus.nlpl.eu/).
20
 
21
  ### bert-base-qarib60_860k
@@ -26,11 +30,11 @@ For Tweets, the data was collected using twitter API and using language filter.
26
  ## Training QARiB
27
  The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
28
  We used a Google Cloud Storage bucket, for persistent storage of training data and models.
29
- See more details in [Training QARiB](../Training_QARiB.md)
30
 
31
  ## Using QARiB
32
 
33
- You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](../Using_QARiB.md)
34
 
35
  ### How to use
36
  You can use this model directly with a pipeline for masked language modeling:
2
  language: ar
3
  tags:
4
  - qarib
5
+ - tf
6
+ - pytorch
7
+
8
 
9
  license: apache-2.0
10
  datasets:
12
  - Abulkhair Arabic Corpus
13
  - opus
14
  - Twitter data
15
+
16
  ---
17
 
18
  # QARiB: QCRI Arabic and Dialectal BERT
19
 
20
  ## About QARiB
21
  QCRI Arabic and Dialectal BERT (QARiB) model, was trained on a collection of ~ 420 Million tweets and ~ 180 Million sentences of text.
22
+ For tweets, the data was collected using twitter API and using language filter. `lang:ar`. For text data, it was a combination from
23
  [Arabic GigaWord](url), [Abulkhair Arabic Corpus]() and [OPUS](http://opus.nlpl.eu/).
24
 
25
  ### bert-base-qarib60_860k
30
  ## Training QARiB
31
  The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
32
  We used a Google Cloud Storage bucket, for persistent storage of training data and models.
33
+ See more details in [Training QARiB](https://github.com/qcri/QARiB/blob/main/Training_QARiB.md)
34
 
35
  ## Using QARiB
36
 
37
+ You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](https://github.com/qcri/QARiB/blob/main/Using_QARiB.md)
38
 
39
  ### How to use
40
  You can use this model directly with a pipeline for masked language modeling: