license: mit
title: 'MinimalGPT: Felis Catus'
sdk: gradio
emoji: 😻
colorFrom: gray
colorTo: blue
pinned: true
MinimalGPT: Felis Catus
[MinimalGPT
] [Project Gutenberg Dataset
]
This HuggingFace space serves as an illustrative application of the GitHub Repository: MinimalGPT, which embodies a departure from conventional GPT models that undergo scaling and training on high-performance computing systems and clusters. The primary objective of the MinimalGPT project was to explore the extent to which a GPT model could be minimized in size.
Within this HF space, we introduce a diminutive GPT model named Felis Catus (stray Cat), which boasts a mere 15 million parameters. What distinguishes this model is its training process, which was executed on a standard home computer CPU (specifically, an AMD Ryzen 5) without any reliance on GPU acceleration. Remarkably, the training duration lasted a mere 15 minutes, utilizing a dataset comprising a meager ~150,000 tokens of text.
At present, the Felis Catus model exhibits the capacity to generate a concise story excerpt consisting of 70 tokens, requiring a mere 5 to 7 words as input. The model's dictionary encompasses a modest 12,000 words. Moreover, we are presently engaged in endeavors to further scale the model in our forthcoming project.
Model Specifications
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 10)] 0
embedding (Embedding) (None, 10, 128) 1597184
positional_embedding (Posit (None, 10, 128) 0
ionalEmbedding)
decoder (Decoder) (None, 10, 128) 71208
flatten (Flatten) (None, 1280) 0
dense (Dense) (None, 12479) 15985599
tf.nn.softmax (TFOpLambda) (None, 12479) 0
=================================================================
Total params: 17,653,991
Trainable params: 17,653,991
Non-trainable params: 0
_________________________________________________________________
Hyperparameters
gpt_input: 10 [Max input size, d_k]
d_model: 128 [Embedding size, d_model]
h: 8 [Number of multiheads, h]
decoder_stacks: 1 [Number of decoder stacks, stack]
GPT_attention: True [Attention Layer implementation type - OpenAI style]
References
- Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
- Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.
- Project Gutenberg. (n.d.). Retrieved FebruApril 20, 2023, from www.gutenberg.org.
- Abadi, Martın, et al. "TensorFlow: Large-scale machine learning on heterogeneous systems, software available from tensorflow. org (2015)." URL https://www.tensorflow.org (2015).