Ebtihal commited on
Commit
9dc882a
1 Parent(s): 975e89f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ar
3
+ tags: Fill-Mask
4
+ datasets: OSCAR
5
+ widget:
6
+ - text: " السلام عليكم ورحمة[MASK] وبركاتة"
7
+ - text: " اهلا وسهلا بكم في [MASK] من سيربح المليون "
8
+ ---
9
+ # Arabic BERT Model
10
+ **AraBERTMo** is an Arabic pre-trained language model based on [Google's BERT architechture](https://github.com/google-research/bert).
11
+ AraBERTMo_base uses the same BERT-Base config.
12
+ AraBERTMo_base now comes in 10 new variants
13
+ All models are available on the `HuggingFace` model page under the [Ebtihal](https://huggingface.co/Ebtihal/) name.
14
+ Checkpoints are available in PyTorch formats.
15
+
16
+ ## Pretraining Corpus
17
+ `AraBertMo_base_V4' model was pre-trained on ~3 million words:
18
+ - [OSCAR](https://traces1.inria.fr/oscar/) - Arabic version "unshuffled_deduplicated_ar".
19
+
20
+ ## Training results
21
+ this model achieves the following results:
22
+
23
+ | Task | Num examples | Num Epochs | Batch Size | steps | Wall time | training loss|
24
+ |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|
25
+ | Fill-Mask| 40032| 4 | 64 | 2500 | 5h 10m 20s | 7.6544 |
26
+
27
+ ## Load Pretrained Model
28
+ You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:
29
+ ```python
30
+ from transformers import AutoTokenizer, AutoModel
31
+ tokenizer = AutoTokenizer.from_pretrained("Ebtihal/AraBertMo_base_V4")
32
+ model = AutoModelForMaskedLM.from_pretrained("Ebtihal/AraBertMo_base_V4")
33
+ ```
34
+
35
+ ## This model was built for master's degree research in an organization:
36
+ - [University of kufa](https://uokufa.edu.iq/).
37
+ - [Faculty of Computer Science and Mathematics](https://mathcomp.uokufa.edu.iq/).
38
+ - **Department of Computer Science**
39
+