gpt2-2layer-1m
A randomly-initialized, 2-layer GPT-2 model for functional / integration testing. It produces incoherent text โ that is by design.
Architecture
| Hyperparameter | Value |
|---|---|
| Architecture | GPT-2 (decoder-only) |
| Layers | 2 |
| Hidden size | 256 |
| Attention heads | 4 |
| FFN inner size | 1 024 |
| Context length | 512 |
| Vocabulary size | 183 (byte-level BPE) |
| Total params | ~1.76 M |
| Non-embedding params | ~1.58 M |
Usage
from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast.from_pretrained("gvadhul/gpt2-2layer-1m")
model = GPT2LMHeadModel.from_pretrained("gvadhul/gpt2-2layer-1m")
inputs = tokenizer("Hello world", return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(out[0]))
Intended use
Functional / unit testing of pipelines that need a tiny causal LM. Not suitable for any real NLP task โ weights are random.
- Downloads last month
- 50