GPT-2 (Trained from Scratch)
A GPT-2โstyle causal language model built and trained entirely from scratch in PyTorch โ no pre-trained weights, no HuggingFace Trainer. Every component (multi-head attention with KV-cache, transformer blocks, weight-tying) was implemented by hand.
Model Details
| Hyperparameter | Value |
|---|---|
| Architecture | GPT-2 (decoder-only transformer) |
| Layers | 12 |
| Attention heads | 12 |
| d_model | 768 |
| FFN hidden dim | 3 072 |
| Context length | 1 024 tokens |
| Vocab size | 50 257 |
| Training steps | 150 000 |
| Tokens seen | ~9.8 B |
| Tokenizer | GPT-2 BPE (tiktoken) |
Usage
With ๐ค Transformers
from transformers import AutoTokenizer
from model.hf_wrapper import GPT2ForCausalLM
model = GPT2ForCausalLM.from_pretrained("saiteja718/gpt2")
tokenizer = AutoTokenizer.from_pretrained("saiteja718/gpt2")
inputs = tokenizer("The capital of France is", return_tensors="pt")
logits = model(**inputs).logits
With the interactive inference script
Clone the repo and run:
git clone https://huggingface.co/saiteja718/gpt2
cd gpt2
pip install torch transformers tiktoken
python3 gpt2_infer.py --interactive
Implementation Highlights
- Multi-head attention with a split KV-cache for efficient autoregressive decoding (prefill + decode loop)
- Weight tying between the token embedding and the LM head
- Top-k sampling with temperature for controllable text generation
- Custom training loop with gradient clipping and cosine LR schedule
Example Output
Prompt: The capital of germany is
Output: The capital of germany is the country he first settled in, and soon the settlement
of the British colonies as a result of his military service...
Limitations
- Trained as a research/learning exercise โ not fine-tuned on any instruction dataset
- May produce factually incorrect or incoherent text
- Context window limited to 1 024 tokens
Citation
If you use this model in your work, a shoutout is appreciated:
@misc{saiteja718-gpt2-scratch,
author = {saiteja718},
title = {GPT-2 Trained from Scratch},
year = {2025},
url = {https://huggingface.co/saiteja718/gpt2}
}
- Downloads last month
- 17