abhaskumarsinha commited on
Commit
31a038a
1 Parent(s): 2eede07

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -9
README.md CHANGED
@@ -1,13 +1,65 @@
1
  ---
2
- title: MinimalGPT-Felis Catus
3
- emoji: 🏢
4
- colorFrom: green
5
- colorTo: yellow
6
- sdk: gradio
7
- sdk_version: 3.34.0
8
- app_file: app.py
9
- pinned: false
10
  license: mit
 
 
 
 
 
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
 
 
2
  license: mit
3
+ title: 'MinimalGPT: Felis Catus'
4
+ sdk: gradio
5
+ emoji: 😻
6
+ colorFrom: gray
7
+ colorTo: blue
8
+ pinned: true
9
  ---
10
 
11
+ # MinimalGPT: Felis Catus
12
+
13
+ [[`MinimalGPT`](https://github.com/abhaskumarsinha/MinimalGPT)] [[`Project Gutenberg Dataset`](https://www.kaggle.com/datasets/shubchat/1002-short-stories-from-project-guttenberg)]
14
+
15
+
16
+ This HuggingFace space serves as an illustrative application of the GitHub Repository: [MinimalGPT](https://github.com/abhaskumarsinha/MinimalGPT), which embodies a departure from conventional GPT models that undergo scaling and training on high-performance computing systems and clusters. The primary objective of the MinimalGPT project was to explore the extent to which a GPT model could be minimized in size.
17
+
18
+ Within this HF space, we introduce a diminutive GPT model named [Felis Catus](https://en.wikipedia.org/wiki/Cat) (stray Cat), which boasts a mere 15 million parameters. What distinguishes this model is its training process, which was executed on a standard home computer CPU (specifically, an AMD Ryzen 5) without any reliance on GPU acceleration. Remarkably, the training duration lasted a mere 15 minutes, utilizing a dataset comprising a meager ~150,000 tokens of text.
19
+
20
+ At present, the Felis Catus model exhibits the capacity to generate a concise story excerpt consisting of 70 tokens, requiring a mere 5 to 7 words as input. The model's dictionary encompasses a modest 12,000 words. Moreover, we are presently engaged in endeavors to further scale the model in our forthcoming project.
21
+
22
+ ## Model Specifications
23
+
24
+ ```
25
+ Model: "model"
26
+ _________________________________________________________________
27
+ Layer (type) Output Shape Param #
28
+ =================================================================
29
+ input_1 (InputLayer) [(None, 10)] 0
30
+
31
+ embedding (Embedding) (None, 10, 128) 1597184
32
+
33
+ positional_embedding (Posit (None, 10, 128) 0
34
+ ionalEmbedding)
35
+
36
+ decoder (Decoder) (None, 10, 128) 71208
37
+
38
+ flatten (Flatten) (None, 1280) 0
39
+
40
+ dense (Dense) (None, 12479) 15985599
41
+
42
+ tf.nn.softmax (TFOpLambda) (None, 12479) 0
43
+
44
+ =================================================================
45
+ Total params: 17,653,991
46
+ Trainable params: 17,653,991
47
+ Non-trainable params: 0
48
+ _________________________________________________________________
49
+ ```
50
+
51
+ ## Hyperparameters
52
+
53
+ ```
54
+ gpt_input: 10 [Max input size, d_k]
55
+ d_model: 128 [Embedding size, d_model]
56
+ h: 8 [Number of multiheads, h]
57
+ decoder_stacks: 1 [Number of decoder stacks, stack]
58
+ GPT_attention: True [Attention Layer implementation type - OpenAI style]
59
+ ```
60
+
61
+ ## References
62
+ 1. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
63
+ 2. Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.
64
+ 3. Project Gutenberg. (n.d.). Retrieved FebruApril 20, 2023, from www.gutenberg.org.
65
+ 4. Abadi, Martın, et al. "TensorFlow: Large-scale machine learning on heterogeneous systems, software available from tensorflow. org (2015)." URL https://www.tensorflow.org (2015).