metadata

license: mit
title: 'MinimalGPT: Felis Catus'
sdk: gradio
emoji: 😻
colorFrom: gray
colorTo: blue
pinned: true

MinimalGPT: Felis Catus

[MinimalGPT] [Project Gutenberg Dataset]

This HuggingFace space serves as an illustrative application of the GitHub Repository: MinimalGPT, which embodies a departure from conventional GPT models that undergo scaling and training on high-performance computing systems and clusters. The primary objective of the MinimalGPT project was to explore the extent to which a GPT model could be minimized in size.

Within this HF space, we introduce a diminutive GPT model named Felis Catus (stray Cat), which boasts a mere 15 million parameters. What distinguishes this model is its training process, which was executed on a standard home computer CPU (specifically, an AMD Ryzen 5) without any reliance on GPU acceleration. Remarkably, the training duration lasted a mere 15 minutes, utilizing a dataset comprising a meager ~150,000 tokens of text.

At present, the Felis Catus model exhibits the capacity to generate a concise story excerpt consisting of 70 tokens, requiring a mere 5 to 7 words as input. The model's dictionary encompasses a modest 12,000 words. Moreover, we are presently engaged in endeavors to further scale the model in our forthcoming project.

Model Specifications

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 10)]              0         
                                                                 
 embedding (Embedding)       (None, 10, 128)           1597184   
                                                                 
 positional_embedding (Posit  (None, 10, 128)          0         
 ionalEmbedding)                                                 
                                                                 
 decoder (Decoder)           (None, 10, 128)           71208     
                                                                 
 flatten (Flatten)           (None, 1280)              0         
                                                                 
 dense (Dense)               (None, 12479)             15985599  
                                                                 
 tf.nn.softmax (TFOpLambda)  (None, 12479)             0         
                                                                 
=================================================================
Total params: 17,653,991
Trainable params: 17,653,991
Non-trainable params: 0
_________________________________________________________________

Hyperparameters

gpt_input: 10 [Max input size, d_k]
d_model: 128 [Embedding size, d_model]
h: 8 [Number of multiheads, h]
decoder_stacks: 1 [Number of decoder stacks, stack]
GPT_attention: True [Attention Layer implementation type - OpenAI style]

References

Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.
Project Gutenberg. (n.d.). Retrieved FebruApril 20, 2023, from www.gutenberg.org.
Abadi, Martın, et al. "TensorFlow: Large-scale machine learning on heterogeneous systems, software available from tensorflow. org (2015)." URL https://www.tensorflow.org (2015).