File size: 1,711 Bytes
742319b f490665 742319b d29c8d4 f490665 742319b 2667b61 e81dea0 048a511 6e021b4 481d3f4 1a6a62a 481d3f4 d29c8d4 481d3f4 1a6a62a 481d3f4 6e021b4 048a511 6e021b4 9af7991 d85da15 09f28e7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
language: ar
tags: Fill-Mask
datasets: OSCAR
widget:
- text: " السلام عليكم ورحمة[MASK] وبركاتة"
- text: " اهلا وسهلا بكم في [MASK] من سيربح المليون "
---
# Arabic BERT Model
**AraBERTMo** is an Arabic pre-trained language model based on [Google's BERT architechture](https://github.com/google-research/bert).
AraBERTMo_base uses the same BERT-Base config.
AraBERTMo_base now comes in 10 new variants
All models are available on the `HuggingFace` model page under the [Ebtihal](https://huggingface.co/Ebtihal/) name.
Checkpoints are available in PyTorch formats.
## Pretraining Corpus
`AraBertMo_base_V1' model was pre-trained on ~3 million words:
- [OSCAR](https://traces1.inria.fr/oscar/) - Arabic version "unshuffled_deduplicated_ar".
## Training results
this model achieves the following results:
| Task | Num examples | Num Epochs | Batch Size | steps | Wall time | training loss|
|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|
| Fill-Mask| 10010| 1 | 64 | 157 | 2m 2s | 9.0183 |
## Load Pretrained Model
You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:
```python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Ebtihal/AraBertMo_base_V1")
model = AutoModelForMaskedLM.from_pretrained("Ebtihal/AraBertMo_base_V1")
```
## This model was built for master's degree research in an organization:
- [University of kufa](https://uokufa.edu.iq/).
- [Faculty of Computer Science and Mathematics](https://mathcomp.uokufa.edu.iq/).
- **Department of Computer Science**
|