gpt2-2layer-1m

A randomly-initialized, 2-layer GPT-2 model for functional / integration testing. It produces incoherent text โ€” that is by design.

Architecture

Hyperparameter Value
Architecture GPT-2 (decoder-only)
Layers 2
Hidden size 256
Attention heads 4
FFN inner size 1 024
Context length 512
Vocabulary size 183 (byte-level BPE)
Total params ~1.76 M
Non-embedding params ~1.58 M

Usage

from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast

tokenizer = PreTrainedTokenizerFast.from_pretrained("gvadhul/gpt2-2layer-1m")
model     = GPT2LMHeadModel.from_pretrained("gvadhul/gpt2-2layer-1m")

inputs = tokenizer("Hello world", return_tensors="pt")
out    = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(out[0]))

Intended use

Functional / unit testing of pipelines that need a tiny causal LM. Not suitable for any real NLP task โ€” weights are random.

Downloads last month
50
Safetensors
Model size
2.15M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support