Spaces:

abhaskumarsinha
/

MinimalGPT-Felis_Catus

Sleeping

App Files Files Community

abhaskumarsinha commited on Jun 12, 2023

Commit

31a038a

•

1 Parent(s): 2eede07

Create README.md

Browse files

Files changed (1) hide show

README.md +61 -9

README.md CHANGED Viewed

@@ -1,13 +1,65 @@
 ---
-title: MinimalGPT-Felis Catus
-emoji: 🏢
-colorFrom: green
-colorTo: yellow
-sdk: gradio
-sdk_version: 3.34.0
-app_file: app.py
-pinned: false
 license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 license: mit
+title: 'MinimalGPT: Felis Catus'
+sdk: gradio
+emoji: 😻
+colorFrom: gray
+colorTo: blue
+pinned: true
 ---
+# MinimalGPT: Felis Catus
+[[`MinimalGPT`](https://github.com/abhaskumarsinha/MinimalGPT)] [[`Project Gutenberg Dataset`](https://www.kaggle.com/datasets/shubchat/1002-short-stories-from-project-guttenberg)]
+This HuggingFace space serves as an illustrative application of the GitHub Repository: [MinimalGPT](https://github.com/abhaskumarsinha/MinimalGPT), which embodies a departure from conventional GPT models that undergo scaling and training on high-performance computing systems and clusters. The primary objective of the MinimalGPT project was to explore the extent to which a GPT model could be minimized in size.
+Within this HF space, we introduce a diminutive GPT model named [Felis Catus](https://en.wikipedia.org/wiki/Cat) (stray Cat), which boasts a mere 15 million parameters. What distinguishes this model is its training process, which was executed on a standard home computer CPU (specifically, an AMD Ryzen 5) without any reliance on GPU acceleration. Remarkably, the training duration lasted a mere 15 minutes, utilizing a dataset comprising a meager ~150,000 tokens of text.
+At present, the Felis Catus model exhibits the capacity to generate a concise story excerpt consisting of 70 tokens, requiring a mere 5 to 7 words as input. The model's dictionary encompasses a modest 12,000 words. Moreover, we are presently engaged in endeavors to further scale the model in our forthcoming project.
+## Model Specifications
+```
+Model: "model"
+_________________________________________________________________
+ Layer (type)                Output Shape              Param #
+=================================================================
+ input_1 (InputLayer)        [(None, 10)]              0
+ embedding (Embedding)       (None, 10, 128)           1597184
+ positional_embedding (Posit  (None, 10, 128)          0
+ ionalEmbedding)
+ decoder (Decoder)           (None, 10, 128)           71208
+ flatten (Flatten)           (None, 1280)              0
+ dense (Dense)               (None, 12479)             15985599
+ tf.nn.softmax (TFOpLambda)  (None, 12479)             0
+=================================================================
+Total params: 17,653,991
+Trainable params: 17,653,991
+Non-trainable params: 0
+_________________________________________________________________
+```
+## Hyperparameters
+```
+gpt_input: 10 [Max input size, d_k]
+d_model: 128 [Embedding size, d_model]
+h: 8 [Number of multiheads, h]
+decoder_stacks: 1 [Number of decoder stacks, stack]
+GPT_attention: True [Attention Layer implementation type - OpenAI style]
+```
+## References
+1. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
+2. Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.
+3. Project Gutenberg. (n.d.). Retrieved FebruApril 20, 2023, from www.gutenberg.org.
+4. Abadi, Martın, et al. "TensorFlow: Large-scale machine learning on heterogeneous systems, software available from tensorflow. org (2015)." URL https://www.tensorflow.org (2015).