--- license: mit title: 'MinimalGPT: Felis Catus' sdk: gradio emoji: 😻 colorFrom: gray colorTo: blue pinned: true --- # MinimalGPT: Felis Catus [[`MinimalGPT`](https://github.com/abhaskumarsinha/MinimalGPT)] [[`Project Gutenberg Dataset`](https://www.kaggle.com/datasets/shubchat/1002-short-stories-from-project-guttenberg)] This HuggingFace space serves as an illustrative application of the GitHub Repository: [MinimalGPT](https://github.com/abhaskumarsinha/MinimalGPT), which embodies a departure from conventional GPT models that undergo scaling and training on high-performance computing systems and clusters. The primary objective of the MinimalGPT project was to explore the extent to which a GPT model could be minimized in size. Within this HF space, we introduce a diminutive GPT model named [Felis Catus](https://en.wikipedia.org/wiki/Cat) (stray Cat), which boasts a mere 15 million parameters. What distinguishes this model is its training process, which was executed on a standard home computer CPU (specifically, an AMD Ryzen 5) without any reliance on GPU acceleration. Remarkably, the training duration lasted a mere 15 minutes, utilizing a dataset comprising a meager ~150,000 tokens of text. At present, the Felis Catus model exhibits the capacity to generate a concise story excerpt consisting of 70 tokens, requiring a mere 5 to 7 words as input. The model's dictionary encompasses a modest 12,000 words. Moreover, we are presently engaged in endeavors to further scale the model in our forthcoming project. ## Model Specifications ``` Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 10)] 0 embedding (Embedding) (None, 10, 128) 1597184 positional_embedding (Posit (None, 10, 128) 0 ionalEmbedding) decoder (Decoder) (None, 10, 128) 71208 flatten (Flatten) (None, 1280) 0 dense (Dense) (None, 12479) 15985599 tf.nn.softmax (TFOpLambda) (None, 12479) 0 ================================================================= Total params: 17,653,991 Trainable params: 17,653,991 Non-trainable params: 0 _________________________________________________________________ ``` ## Hyperparameters ``` gpt_input: 10 [Max input size, d_k] d_model: 128 [Embedding size, d_model] h: 8 [Number of multiheads, h] decoder_stacks: 1 [Number of decoder stacks, stack] GPT_attention: True [Attention Layer implementation type - OpenAI style] ``` ## References 1. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017). 2. Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9. 3. Project Gutenberg. (n.d.). Retrieved FebruApril 20, 2023, from www.gutenberg.org. 4. Abadi, Martın, et al. "TensorFlow: Large-scale machine learning on heterogeneous systems, software available from tensorflow. org (2015)." URL https://www.tensorflow.org (2015).