Update links for training and using the models.
Browse files
README.md
CHANGED
@@ -2,6 +2,9 @@
|
|
2 |
language: ar
|
3 |
tags:
|
4 |
- qarib
|
|
|
|
|
|
|
5 |
|
6 |
license: apache-2.0
|
7 |
datasets:
|
@@ -9,13 +12,14 @@ datasets:
|
|
9 |
- Abulkhair Arabic Corpus
|
10 |
- opus
|
11 |
- Twitter data
|
|
|
12 |
---
|
13 |
|
14 |
# QARiB: QCRI Arabic and Dialectal BERT
|
15 |
|
16 |
## About QARiB
|
17 |
QCRI Arabic and Dialectal BERT (QARiB) model, was trained on a collection of ~ 420 Million tweets and ~ 180 Million sentences of text.
|
18 |
-
For
|
19 |
[Arabic GigaWord](url), [Abulkhair Arabic Corpus]() and [OPUS](http://opus.nlpl.eu/).
|
20 |
|
21 |
### bert-base-qarib60_860k
|
@@ -26,11 +30,11 @@ For Tweets, the data was collected using twitter API and using language filter.
|
|
26 |
## Training QARiB
|
27 |
The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
|
28 |
We used a Google Cloud Storage bucket, for persistent storage of training data and models.
|
29 |
-
See more details in [Training QARiB](
|
30 |
|
31 |
## Using QARiB
|
32 |
|
33 |
-
You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](
|
34 |
|
35 |
### How to use
|
36 |
You can use this model directly with a pipeline for masked language modeling:
|
2 |
language: ar
|
3 |
tags:
|
4 |
- qarib
|
5 |
+
- tf
|
6 |
+
- pytorch
|
7 |
+
|
8 |
|
9 |
license: apache-2.0
|
10 |
datasets:
|
12 |
- Abulkhair Arabic Corpus
|
13 |
- opus
|
14 |
- Twitter data
|
15 |
+
|
16 |
---
|
17 |
|
18 |
# QARiB: QCRI Arabic and Dialectal BERT
|
19 |
|
20 |
## About QARiB
|
21 |
QCRI Arabic and Dialectal BERT (QARiB) model, was trained on a collection of ~ 420 Million tweets and ~ 180 Million sentences of text.
|
22 |
+
For tweets, the data was collected using twitter API and using language filter. `lang:ar`. For text data, it was a combination from
|
23 |
[Arabic GigaWord](url), [Abulkhair Arabic Corpus]() and [OPUS](http://opus.nlpl.eu/).
|
24 |
|
25 |
### bert-base-qarib60_860k
|
30 |
## Training QARiB
|
31 |
The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
|
32 |
We used a Google Cloud Storage bucket, for persistent storage of training data and models.
|
33 |
+
See more details in [Training QARiB](https://github.com/qcri/QARiB/blob/main/Training_QARiB.md)
|
34 |
|
35 |
## Using QARiB
|
36 |
|
37 |
+
You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](https://github.com/qcri/QARiB/blob/main/Using_QARiB.md)
|
38 |
|
39 |
### How to use
|
40 |
You can use this model directly with a pipeline for masked language modeling:
|