modded-GPT-1

A small, modern rebuild of GPT-1 trained on WikiText-103.

Checkpoint

File: wikitext103-50m_final.pt
Parameters: 41.6M
Training: 20,000 steps, WikiText-103, Muon + torch.compile
Validation loss: 3.2998
Validation perplexity: 27.11

Local Benchmark Snapshot

These are local validation scores from the repo's GLUE fine-tuning script, compared with GPT-1 paper/test-set numbers. They are useful as a practical comparison, not as a leaderboard-equivalent reproduction.

Task	GPT-1 Paper	This Checkpoint	Delta
RTE	56.0	62.1	+6.1
MRPC	82.3	82.1	-0.2
STS-B	82.0	79.7	-2.3
SST-2	91.3	86.9	-4.4
CoLA	45.4	11.8	-33.6

Loading

This is a custom PyTorch checkpoint. Use the model.py included in this model repo or the GitHub repository.

import torch
from model import GPT

ckpt = torch.load("wikitext103-50m_final.pt", map_location="cpu", weights_only=True)
model = GPT(ckpt["config"])
model.load_state_dict(ckpt["model"])
model.eval()

The tokenizer is included as tokenizer.json.

Notes

This model is much smaller than GPT-1: 41.6M params vs ~117M.
It was trained on WikiText-103, not BooksCorpus.
The CoLA grammar benchmark remains the clear weak spot.
The Hugging Face token used for upload should be rotated after publishing.

Downloads last month: -; Downloads are not tracked for this model. How to track