riotu-lab commited on
Commit
d1f5d67
1 Parent(s): 502ebdd

Xrear README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ar
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - 'arabic '
8
+ - text-generation
9
+ ---
10
+ Model Description
11
+ Model Name: ArabicGPT-S
12
+ Architecture: GPT-2
13
+ Layers: 12
14
+ Model Size: 134M
15
+ Context Window Size: 768
16
+
17
+ ArabianGPT is a custom-trained version of the GPT-2 base model, specifically tailored for the Arabic language. It is designed to understand and generate Arabic text, making it suitable for various natural language processing tasks in Arabic.
18
+
19
+ Training
20
+ Dataset: Abu Elkhiar Corpus
21
+ Size: 15.5 GB
22
+ Number of Words: 237,814,541
23
+ Number of Tokens: 1,752,421,071
24
+ Epochs: 5.87
25
+ Loss: 3.97
26
+
27
+ The model was trained on the Abu Elkhiar dataset, a comprehensive Arabic text corpus encompassing a wide range of topics. The training process focused on adapting the model to understand the nuances and complexities of the Arabic language.
28
+
29
+ Tokenizer:
30
+ Type: Custom trained SentencePiece tokenizer
31
+ Vocabulary Size: 64K
32
+
33
+ We employed AraNizer, a custom trained tokenizer based on the SentencePiece model, with a vocabulary size of 64. This choice was made to optimize the model's performance for the specific characteristics of the Arabic language.
34
+
35
+ Usage
36
+ ArabianGPT can be used for text generation
37
+
38
+ Limitations
39
+ As with any language model, ArabicGPT may have limitations in understanding context or generating text in certain scenarios. Users should be aware of these limitations and use the model accordingly.
40
+
41
+ Ethical Considerations
42
+ We emphasize responsible usage of ArabianGPT. Users should ensure that the generated text is used ethically and does not propagate misinformation or harmful content.
43
+
44
+ Citation
45
+ If you use ArabianGPT in your research or application, please cite it as follows:
46
+
47
+ @misc{ArabianGPT, 2023,
48
+ title={ArabianGPT: A GPT-2 Based Language Model for Arabic},
49
+ author={Najar, Omar and Sibaee, Serry and Ghouti, Lahouari and Koubaa, Anis},
50
+ affiliation={Prince Sultan University, Riyadh, Saudi Arabia},
51
+ year={2023},
52
+ }
53
+
54
+
55
+ Acknowledgments
56
+ We thank Prince Sultan University espically Robotoics and Internet of Things Lab for suuport
57
+
58
+ Contact
59
+ For inquiries regarding ArabicGPT-S, please contact onajar@psu.edu.sa