CAMeL-Lab
/

bert-base-arabic-camelbert-da

@@ -3,10 +3,10 @@ language:
 - ar
 license: apache-2.0
 widget:
- - text: "\UTF{0627}\UTF{0644}\UTF{0647}\UTF{062F}\UTF{0641} \UTF{0645}\UTF{0646} \UTF{0627}\UTF{0644}\UTF{062D}\UTF{064A}\UTF{0627}\UTF{0629} \UTF{0647}\UTF{0648} [MASK] ."
 ---
-# bert-base-camelbert-msa
 ## Model description
@@ -18,7 +18,7 @@ We release eight models with different sizes and variants as follows:
 |-|-|:-:|-:|-:|
 ||`bert-base-camelbert-mix`|CA,DA,MSA|167GB|17.3B|
 ||`bert-base-camelbert-ca`|CA|6GB|847M|
-|\UTF{2714}|`bert-base-camelbert-da`|DA|54GB|5.8B|
 ||`bert-base-camelbert-msa`|MSA|107GB|12.6B|
 ||`bert-base-camelbert-msa-half`|MSA|53GB|6.3B|
 ||`bert-base-camelbert-msa-quarter`|MSA|27GB|3.1B|
@@ -37,27 +37,27 @@ You can use this model directly with a pipeline for masked language modeling:
 ```python
 >>> from transformers import pipeline
 >>> unmasker = pipeline('fill-mask', model='CAMeL-Lab/bert-base-camelbert-da')
->>> unmasker("\UTF{0627}\UTF{0644}\UTF{0647}\UTF{062F}\UTF{0641} \UTF{0645}\UTF{0646} \UTF{0627}\UTF{0644}\UTF{062D}\UTF{064A}\UTF{0627}\UTF{0629} \UTF{0647}\UTF{0648} [MASK] .")
-[{'sequence': '[CLS] \UTF{0627}\UTF{0644}\UTF{0647}\UTF{062F}\UTF{0641} \UTF{0645}\UTF{0646} \UTF{0627}\UTF{0644}\UTF{062D}\UTF{064A}\UTF{0627}\UTF{0629} \UTF{0647}\UTF{0648}.. [SEP]',
   'score': 0.062508225440979,
   'token': 18,
   'token_str': '.'},
- {'sequence': '[CLS] \UTF{0627}\UTF{0644}\UTF{0647}\UTF{062F}\UTF{0641} \UTF{0645}\UTF{0646} \UTF{0627}\UTF{0644}\UTF{062D}\UTF{064A}\UTF{0627}\UTF{0629} \UTF{0647}\UTF{0648} \UTF{0627}\UTF{0644}\UTF{0645}\UTF{0648}\UTF{062A}. [SEP]',
   'score': 0.033172328025102615,
   'token': 4295,
-  'token_str': '\UTF{0627}\UTF{0644}\UTF{0645}\UTF{0648}\UTF{062A}'},
- {'sequence': '[CLS] \UTF{0627}\UTF{0644}\UTF{0647}\UTF{062F}\UTF{0641} \UTF{0645}\UTF{0646} \UTF{0627}\UTF{0644}\UTF{062D}\UTF{064A}\UTF{0627}\UTF{0629} \UTF{0647}\UTF{0648} \UTF{0627}\UTF{0644}\UTF{062D}\UTF{064A}\UTF{0627}\UTF{0629}. [SEP]',
   'score': 0.029575437307357788,
   'token': 3696,
-  'token_str': '\UTF{0627}\UTF{0644}\UTF{062D}\UTF{064A}\UTF{0627}\UTF{0629}'},
- {'sequence': '[CLS] \UTF{0627}\UTF{0644}\UTF{0647}\UTF{062F}\UTF{0641} \UTF{0645}\UTF{0646} \UTF{0627}\UTF{0644}\UTF{062D}\UTF{064A}\UTF{0627}\UTF{0629} \UTF{0647}\UTF{0648} \UTF{0627}\UTF{0644}\UTF{0631}\UTF{062D}\UTF{064A}\UTF{0644}. [SEP]',
   'score': 0.02724040113389492,
   'token': 11449,
-  'token_str': '\UTF{0627}\UTF{0644}\UTF{0631}\UTF{062D}\UTF{064A}\UTF{0644}'},
- {'sequence': '[CLS] \UTF{0627}\UTF{0644}\UTF{0647}\UTF{062F}\UTF{0641} \UTF{0645}\UTF{0646} \UTF{0627}\UTF{0644}\UTF{062D}\UTF{064A}\UTF{0627}\UTF{0629} \UTF{0647}\UTF{0648} \UTF{0627}\UTF{0644}\UTF{062D}\UTF{0628}. [SEP]',
   'score': 0.01564178802073002,
   'token': 3088,
-  'token_str': '\UTF{0627}\UTF{0644}\UTF{062D}\UTF{0628}'}]
 ```
 Here is how to use this model to get the features of a given text in PyTorch:
@@ -65,7 +65,7 @@ Here is how to use this model to get the features of a given text in PyTorch:
 from transformers import AutoTokenizer, AutoModel
 tokenizer = AutoTokenizer.from_pretrained('CAMeL-Lab/bert-base-camelbert-da')
 model = AutoModel.from_pretrained('CAMeL-Lab/bert-base-camelbert-da')
-text = "\UTF{0645}\UTF{0631}\UTF{062D}\UTF{0628}\UTF{0627} \UTF{064A}\UTF{0627} \UTF{0639}\UTF{0627}\UTF{0644}\UTF{0645}."
 encoded_input = tokenizer(text, return_tensors='pt')
 output = model(**encoded_input)
 ```
@@ -75,7 +75,7 @@ and in TensorFlow:
 from transformers import AutoTokenizer, TFAutoModel
 tokenizer = AutoTokenizer.from_pretrained('CAMeL-Lab/bert-base-camelbert-da')
 model = TFAutoModel.from_pretrained('CAMeL-Lab/bert-base-camelbert-da')
-text = "\UTF{0645}\UTF{0631}\UTF{062D}\UTF{0628}\UTF{0627} \UTF{064A}\UTF{0627} \UTF{0639}\UTF{0627}\UTF{0644}\UTF{0645}."
 encoded_input = tokenizer(text, return_tensors='tf')
 output = model(encoded_input)
 ```

 - ar
 license: apache-2.0
 widget:
+ - text: "الهدف من الحياة هو [MASK] ."
 ---
+# CAMeLBERT-DA
 ## Model description
 |-|-|:-:|-:|-:|
 ||`bert-base-camelbert-mix`|CA,DA,MSA|167GB|17.3B|
 ||`bert-base-camelbert-ca`|CA|6GB|847M|
+|✔|`bert-base-camelbert-da`|DA|54GB|5.8B|
 ||`bert-base-camelbert-msa`|MSA|107GB|12.6B|
 ||`bert-base-camelbert-msa-half`|MSA|53GB|6.3B|
 ||`bert-base-camelbert-msa-quarter`|MSA|27GB|3.1B|
 ```python
 >>> from transformers import pipeline
 >>> unmasker = pipeline('fill-mask', model='CAMeL-Lab/bert-base-camelbert-da')
+>>> unmasker("الهدف من الحياة هو [MASK] .")
+[{'sequence': '[CLS] الهدف من الحياة هو.. [SEP]',
   'score': 0.062508225440979,
   'token': 18,
   'token_str': '.'},
+ {'sequence': '[CLS] الهدف من الحياة هو الموت. [SEP]',
   'score': 0.033172328025102615,
   'token': 4295,
+  'token_str': 'الموت'},
+ {'sequence': '[CLS] الهدف من الحياة هو الحياة. [SEP]',
   'score': 0.029575437307357788,
   'token': 3696,
+  'token_str': 'الحياة'},
+ {'sequence': '[CLS] الهدف من الحياة هو الرحيل. [SEP]',
   'score': 0.02724040113389492,
   'token': 11449,
+  'token_str': 'الرحيل'},
+ {'sequence': '[CLS] الهدف من الحياة هو الحب. [SEP]',
   'score': 0.01564178802073002,
   'token': 3088,
+  'token_str': 'الحب'}]
 ```
 Here is how to use this model to get the features of a given text in PyTorch:
 from transformers import AutoTokenizer, AutoModel
 tokenizer = AutoTokenizer.from_pretrained('CAMeL-Lab/bert-base-camelbert-da')
 model = AutoModel.from_pretrained('CAMeL-Lab/bert-base-camelbert-da')
+text = "مرحبا يا عالم."
 encoded_input = tokenizer(text, return_tensors='pt')
 output = model(**encoded_input)
 ```
 from transformers import AutoTokenizer, TFAutoModel
 tokenizer = AutoTokenizer.from_pretrained('CAMeL-Lab/bert-base-camelbert-da')
 model = TFAutoModel.from_pretrained('CAMeL-Lab/bert-base-camelbert-da')
+text = "مرحبا يا عالم."
 encoded_input = tokenizer(text, return_tensors='tf')
 output = model(encoded_input)
 ```