Spaces:

abhaskumarsinha
/

MinimalGPT-Felis_Catus

Running

App Files Files Community

MinimalGPT-Felis_Catus / README.md

abhaskumarsinha

Create README.md

31a038a about 1 year ago

preview code

raw

history blame contribute delete

No virus

3.75 kB

	---
	license: mit
	title: 'MinimalGPT: Felis Catus'
	sdk: gradio
	emoji: 😻
	colorFrom: gray
	colorTo: blue
	pinned: true
	---

	# MinimalGPT: Felis Catus

	[[`MinimalGPT`](https://github.com/abhaskumarsinha/MinimalGPT)] [[`Project Gutenberg Dataset`](https://www.kaggle.com/datasets/shubchat/1002-short-stories-from-project-guttenberg)]


	This HuggingFace space serves as an illustrative application of the GitHub Repository: [MinimalGPT](https://github.com/abhaskumarsinha/MinimalGPT), which embodies a departure from conventional GPT models that undergo scaling and training on high-performance computing systems and clusters. The primary objective of the MinimalGPT project was to explore the extent to which a GPT model could be minimized in size.

	Within this HF space, we introduce a diminutive GPT model named [Felis Catus](https://en.wikipedia.org/wiki/Cat) (stray Cat), which boasts a mere 15 million parameters. What distinguishes this model is its training process, which was executed on a standard home computer CPU (specifically, an AMD Ryzen 5) without any reliance on GPU acceleration. Remarkably, the training duration lasted a mere 15 minutes, utilizing a dataset comprising a meager ~150,000 tokens of text.

	At present, the Felis Catus model exhibits the capacity to generate a concise story excerpt consisting of 70 tokens, requiring a mere 5 to 7 words as input. The model's dictionary encompasses a modest 12,000 words. Moreover, we are presently engaged in endeavors to further scale the model in our forthcoming project.

	## Model Specifications

	```
	Model: "model"
	_________________________________________________________________
	Layer (type) Output Shape Param #
	=================================================================
	input_1 (InputLayer) [(None, 10)] 0

	embedding (Embedding) (None, 10, 128) 1597184

	positional_embedding (Posit (None, 10, 128) 0
	ionalEmbedding)

	decoder (Decoder) (None, 10, 128) 71208

	flatten (Flatten) (None, 1280) 0

	dense (Dense) (None, 12479) 15985599

	tf.nn.softmax (TFOpLambda) (None, 12479) 0

	=================================================================
	Total params: 17,653,991
	Trainable params: 17,653,991
	Non-trainable params: 0
	_________________________________________________________________
	```

	## Hyperparameters

	```
	gpt_input: 10 [Max input size, d_k]
	d_model: 128 [Embedding size, d_model]
	h: 8 [Number of multiheads, h]
	decoder_stacks: 1 [Number of decoder stacks, stack]
	GPT_attention: True [Attention Layer implementation type - OpenAI style]
	```

	## References
	1. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
	2. Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.
	3. Project Gutenberg. (n.d.). Retrieved FebruApril 20, 2023, from www.gutenberg.org.
	4. Abadi, Martın, et al. "TensorFlow: Large-scale machine learning on heterogeneous systems, software available from tensorflow. org (2015)." URL https://www.tensorflow.org (2015).