NanoDiffusionGPT Tiny Shakespeare

This is a small character-level diffusion language model trained on Tiny Shakespeare.

It is not a GPT-style next-token model. It uses a bidirectional Transformer and learns to reconstruct randomly masked characters. Generation starts with a prompt plus [MASK] tokens, then fills the most confident masked positions over multiple denoising steps.

Files

ckpt.pt: PyTorch checkpoint
model.py: NanoDiffusionGPT model definition
sample.py: local sampling script
meta.pkl: character vocabulary metadata
config.json: architecture and training metadata
requirements.txt: minimal Python dependencies

Training

dataset: Tiny Shakespeare, character-level
iterations: 100,000
parameters: 3.16M
block size: 256
layers: 4
heads: 4
embedding size: 256
objective: masked denoising
final checkpoint val loss: 1.9720

Usage

pip install -r requirements.txt
python sample.py --out_dir=. --device=cpu --dtype=float32 --start="To be" --max_new_tokens=180 --steps=128

On Apple Silicon:

python sample.py --out_dir=. --device=mps --dtype=float32 --compile=False --start="KING:"

Example Output

KING:
What, my lord, my my lord?

DUKE OF AAMERLLE:
What is  my lord, that you you will?

DERBY:
To take your lord, your lord.

The model is intentionally tiny and educational. Expect Shakespeare-like structure, line breaks, speaker names, and invented spellings, not coherent large-model prose.

Downloads last month: 14

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support