modded-GPT-1
A small, modern rebuild of GPT-1 trained on WikiText-103.
Checkpoint
- File:
wikitext103-50m_final.pt - Parameters: 41.6M
- Training: 20,000 steps, WikiText-103, Muon +
torch.compile - Validation loss: 3.2998
- Validation perplexity: 27.11
Local Benchmark Snapshot
These are local validation scores from the repo's GLUE fine-tuning script, compared with GPT-1 paper/test-set numbers. They are useful as a practical comparison, not as a leaderboard-equivalent reproduction.
| Task | GPT-1 Paper | This Checkpoint | Delta |
|---|---|---|---|
| RTE | 56.0 | 62.1 | +6.1 |
| MRPC | 82.3 | 82.1 | -0.2 |
| STS-B | 82.0 | 79.7 | -2.3 |
| SST-2 | 91.3 | 86.9 | -4.4 |
| CoLA | 45.4 | 11.8 | -33.6 |
Loading
This is a custom PyTorch checkpoint. Use the model.py included in this model repo or the GitHub repository.
import torch
from model import GPT
ckpt = torch.load("wikitext103-50m_final.pt", map_location="cpu", weights_only=True)
model = GPT(ckpt["config"])
model.load_state_dict(ckpt["model"])
model.eval()
The tokenizer is included as tokenizer.json.
Notes
- This model is much smaller than GPT-1: 41.6M params vs ~117M.
- It was trained on WikiText-103, not BooksCorpus.
- The CoLA grammar benchmark remains the clear weak spot.
- The Hugging Face token used for upload should be rotated after publishing.
