riotu-lab commited on
Commit
b253885
1 Parent(s): d1f5d67

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -23
README.md CHANGED
@@ -7,53 +7,68 @@ tags:
7
  - 'arabic '
8
  - text-generation
9
  ---
10
- Model Description
11
- Model Name: ArabicGPT-S
 
12
  Architecture: GPT-2
13
  Layers: 12
14
  Model Size: 134M
15
  Context Window Size: 768
16
-
17
  ArabianGPT is a custom-trained version of the GPT-2 base model, specifically tailored for the Arabic language. It is designed to understand and generate Arabic text, making it suitable for various natural language processing tasks in Arabic.
18
 
19
- Training
20
  Dataset: Abu Elkhiar Corpus
21
  Size: 15.5 GB
22
  Number of Words: 237,814,541
23
  Number of Tokens: 1,752,421,071
 
24
  Epochs: 5.87
25
  Loss: 3.97
26
-
27
  The model was trained on the Abu Elkhiar dataset, a comprehensive Arabic text corpus encompassing a wide range of topics. The training process focused on adapting the model to understand the nuances and complexities of the Arabic language.
28
 
29
- Tokenizer:
30
- Type: Custom trained SentencePiece tokenizer
31
  Vocabulary Size: 64K
 
 
 
 
 
 
 
32
 
33
- We employed AraNizer, a custom trained tokenizer based on the SentencePiece model, with a vocabulary size of 64. This choice was made to optimize the model's performance for the specific characteristics of the Arabic language.
34
 
35
- Usage
36
- ArabianGPT can be used for text generation
37
 
38
- Limitations
39
- As with any language model, ArabicGPT may have limitations in understanding context or generating text in certain scenarios. Users should be aware of these limitations and use the model accordingly.
40
 
41
- Ethical Considerations
 
 
 
 
 
 
 
 
 
 
42
  We emphasize responsible usage of ArabianGPT. Users should ensure that the generated text is used ethically and does not propagate misinformation or harmful content.
43
 
44
- Citation
45
  If you use ArabianGPT in your research or application, please cite it as follows:
46
 
47
  @misc{ArabianGPT, 2023,
48
- title={ArabianGPT: A GPT-2 Based Language Model for Arabic},
49
- author={Najar, Omar and Sibaee, Serry and Ghouti, Lahouari and Koubaa, Anis},
50
- affiliation={Prince Sultan University, Riyadh, Saudi Arabia},
51
- year={2023},
52
  }
53
 
 
 
54
 
55
- Acknowledgments
56
- We thank Prince Sultan University espically Robotoics and Internet of Things Lab for suuport
57
-
58
- Contact
59
- For inquiries regarding ArabicGPT-S, please contact onajar@psu.edu.sa
 
7
  - 'arabic '
8
  - text-generation
9
  ---
10
+ # Model Description
11
+
12
+ Model Name: ArabianGPT
13
  Architecture: GPT-2
14
  Layers: 12
15
  Model Size: 134M
16
  Context Window Size: 768
 
17
  ArabianGPT is a custom-trained version of the GPT-2 base model, specifically tailored for the Arabic language. It is designed to understand and generate Arabic text, making it suitable for various natural language processing tasks in Arabic.
18
 
19
+ # Training
20
  Dataset: Abu Elkhiar Corpus
21
  Size: 15.5 GB
22
  Number of Words: 237,814,541
23
  Number of Tokens: 1,752,421,071
24
+ Number of Parameters : 134 M Params
25
  Epochs: 5.87
26
  Loss: 3.97
 
27
  The model was trained on the Abu Elkhiar dataset, a comprehensive Arabic text corpus encompassing a wide range of topics. The training process focused on adapting the model to understand the nuances and complexities of the Arabic language.
28
 
29
+ # Tokenizer
30
+ Type: Custom trained SentencePiece tokenizer
31
  Vocabulary Size: 64K
32
+ We employed AraNizer, a custom trained tokenizer based on the SentencePiece model, with a vocabulary size of 64K. This choice was made to optimize the model's performance for the specific characteristics of the Arabic language.
33
+
34
+ More info about AraNizer can be found here [Link](https://github.com/omarnj-lab/aranizer/tree/main)
35
+
36
+
37
+ # Usage
38
+ ArabianGPT can be used for text generation tasks in Arabic.
39
 
40
+ ### How to use
41
 
42
+ Here is how to use this model to generate ruby function documentation using Transformers SummarizationPipeline:
 
43
 
44
+ ```python
45
+ from transformers import pipeline
46
 
47
+ pipe = pipeline("text-generation", model="riotu-lab/Ghazal" , max_new_tokens = 512)
48
+
49
+ text = ''
50
+
51
+ pipe.predict(text)
52
+ ```
53
+
54
+ # Limitations
55
+ As with any language model, ArabianGPT may have limitations in understanding context or generating text in certain scenarios. Users should be aware of these limitations and use the model accordingly.
56
+
57
+ # Ethical Considerations
58
  We emphasize responsible usage of ArabianGPT. Users should ensure that the generated text is used ethically and does not propagate misinformation or harmful content.
59
 
60
+ # Citation
61
  If you use ArabianGPT in your research or application, please cite it as follows:
62
 
63
  @misc{ArabianGPT, 2023,
64
+ title={ArabianGPT: A GPT-2 Based Language Model for Arabic},
65
+ author={Najar, Omar and Sibaee, Serry and Ghouti, Lahouari and Koubaa, Anis},
66
+ affiliation={Prince Sultan University, Riyadh, Saudi Arabia},
67
+ year={2023},
68
  }
69
 
70
+ # Acknowledgments
71
+ We thank Prince Sultan University, especially the Robotics and Internet of Things Lab, for their support.
72
 
73
+ # Contact
74
+ For inquiries regarding ArabianGPT, please contact onajar@psu.edu.sa.