go-inoue commited on
Commit
109e4b7
1 Parent(s): 86a3802

Update model name

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -25,18 +25,18 @@ We release pre-trained language models for Modern Standard Arabic (MSA), dialect
25
  We also provide additional models that are pre-trained on a scaled-down set of the MSA variant (half, quarter, eighth, and sixteenth).
26
  The details are described in the paper *"[The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models](https://arxiv.org/abs/2103.06678)."*
27
 
28
- This model card describes **CAMeLBERT-Mix** (`bert-base-camelbert-mix`), a model pre-trained on a mixture of these variants: MSA, DA, and CA.
29
 
30
  ||Model|Variant|Size|#Word|
31
  |-|-|:-:|-:|-:|
32
- |✔|`bert-base-camelbert-mix`|CA,DA,MSA|167GB|17.3B|
33
- ||`bert-base-camelbert-ca`|CA|6GB|847M|
34
- ||`bert-base-camelbert-da`|DA|54GB|5.8B|
35
- ||`bert-base-camelbert-msa`|MSA|107GB|12.6B|
36
- ||`bert-base-camelbert-msa-half`|MSA|53GB|6.3B|
37
- ||`bert-base-camelbert-msa-quarter`|MSA|27GB|3.1B|
38
- ||`bert-base-camelbert-msa-eighth`|MSA|14GB|1.6B|
39
- ||`bert-base-camelbert-msa-sixteenth`|MSA|6GB|746M|
40
 
41
  ## Intended uses
42
  You can use the released model for either masked language modeling or next sentence prediction.
@@ -47,7 +47,7 @@ We release our fine-tuninig code [here](https://github.com/CAMeL-Lab/CAMeLBERT).
47
  You can use this model directly with a pipeline for masked language modeling:
48
  ```python
49
  >>> from transformers import pipeline
50
- >>> unmasker = pipeline('fill-mask', model='CAMeL-Lab/bert-base-camelbert-mix')
51
  >>> unmasker("الهدف من الحياة هو [MASK] .")
52
  [{'sequence': '[CLS] الهدف من الحياة هو النجاح. [SEP]',
53
  'score': 0.10861027985811234,
@@ -76,8 +76,8 @@ You can use this model directly with a pipeline for masked language modeling:
76
  Here is how to use this model to get the features of a given text in PyTorch:
77
  ```python
78
  from transformers import AutoTokenizer, AutoModel
79
- tokenizer = AutoTokenizer.from_pretrained('CAMeL-Lab/bert-base-camelbert-mix')
80
- model = AutoModel.from_pretrained('CAMeL-Lab/bert-base-camelbert-mix')
81
  text = "مرحبا يا عالم."
82
  encoded_input = tokenizer(text, return_tensors='pt')
83
  output = model(**encoded_input)
@@ -86,8 +86,8 @@ output = model(**encoded_input)
86
  and in TensorFlow:
87
  ```python
88
  from transformers import AutoTokenizer, TFAutoModel
89
- tokenizer = AutoTokenizer.from_pretrained('CAMeL-Lab/bert-base-camelbert-mix')
90
- model = TFAutoModel.from_pretrained('CAMeL-Lab/bert-base-camelbert-mix')
91
  text = "مرحبا يا عالم."
92
  encoded_input = tokenizer(text, return_tensors='tf')
93
  output = model(encoded_input)
 
25
  We also provide additional models that are pre-trained on a scaled-down set of the MSA variant (half, quarter, eighth, and sixteenth).
26
  The details are described in the paper *"[The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models](https://arxiv.org/abs/2103.06678)."*
27
 
28
+ This model card describes **CAMeLBERT-Mix** (`bert-base-arabic-camelbert-mix`), a model pre-trained on a mixture of these variants: MSA, DA, and CA.
29
 
30
  ||Model|Variant|Size|#Word|
31
  |-|-|:-:|-:|-:|
32
+ |✔|`bert-base-arabic-camelbert-mix`|CA,DA,MSA|167GB|17.3B|
33
+ ||`bert-base-arabic-camelbert-ca`|CA|6GB|847M|
34
+ ||`bert-base-arabic-camelbert-da`|DA|54GB|5.8B|
35
+ ||`bert-base-arabic-camelbert-msa`|MSA|107GB|12.6B|
36
+ ||`bert-base-arabic-camelbert-msa-half`|MSA|53GB|6.3B|
37
+ ||`bert-base-arabic-camelbert-msa-quarter`|MSA|27GB|3.1B|
38
+ ||`bert-base-arabic-camelbert-msa-eighth`|MSA|14GB|1.6B|
39
+ ||`bert-base-arabic-camelbert-msa-sixteenth`|MSA|6GB|746M|
40
 
41
  ## Intended uses
42
  You can use the released model for either masked language modeling or next sentence prediction.
 
47
  You can use this model directly with a pipeline for masked language modeling:
48
  ```python
49
  >>> from transformers import pipeline
50
+ >>> unmasker = pipeline('fill-mask', model='CAMeL-Lab/bert-base-arabic-camelbert-mix')
51
  >>> unmasker("الهدف من الحياة هو [MASK] .")
52
  [{'sequence': '[CLS] الهدف من الحياة هو النجاح. [SEP]',
53
  'score': 0.10861027985811234,
 
76
  Here is how to use this model to get the features of a given text in PyTorch:
77
  ```python
78
  from transformers import AutoTokenizer, AutoModel
79
+ tokenizer = AutoTokenizer.from_pretrained('CAMeL-Lab/bert-base-arabic-camelbert-mix')
80
+ model = AutoModel.from_pretrained('CAMeL-Lab/bert-base-arabic-camelbert-mix')
81
  text = "مرحبا يا عالم."
82
  encoded_input = tokenizer(text, return_tensors='pt')
83
  output = model(**encoded_input)
 
86
  and in TensorFlow:
87
  ```python
88
  from transformers import AutoTokenizer, TFAutoModel
89
+ tokenizer = AutoTokenizer.from_pretrained('CAMeL-Lab/bert-base-arabic-camelbert-mix')
90
+ model = TFAutoModel.from_pretrained('CAMeL-Lab/bert-base-arabic-camelbert-mix')
91
  text = "مرحبا يا عالم."
92
  encoded_input = tokenizer(text, return_tensors='tf')
93
  output = model(encoded_input)