NanoDiffusionGPT Tiny Shakespeare

This is a small character-level diffusion language model trained on Tiny Shakespeare.

It is not a GPT-style next-token model. It uses a bidirectional Transformer and learns to reconstruct randomly masked characters. Generation starts with a prompt plus [MASK] tokens, then fills the most confident masked positions over multiple denoising steps.

Files

  • ckpt.pt: PyTorch checkpoint
  • model.py: NanoDiffusionGPT model definition
  • sample.py: local sampling script
  • meta.pkl: character vocabulary metadata
  • config.json: architecture and training metadata
  • requirements.txt: minimal Python dependencies

Training

  • dataset: Tiny Shakespeare, character-level
  • iterations: 100,000
  • parameters: 3.16M
  • block size: 256
  • layers: 4
  • heads: 4
  • embedding size: 256
  • objective: masked denoising
  • final checkpoint val loss: 1.9720

Usage

pip install -r requirements.txt
python sample.py --out_dir=. --device=cpu --dtype=float32 --start="To be" --max_new_tokens=180 --steps=128

On Apple Silicon:

python sample.py --out_dir=. --device=mps --dtype=float32 --compile=False --start="KING:"

Example Output

KING:
What, my lord, my my lord?

DUKE OF AAMERLLE:
What is  my lord, that you you will?

DERBY:
To take your lord, your lord.

The model is intentionally tiny and educational. Expect Shakespeare-like structure, line breaks, speaker names, and invented spellings, not coherent large-model prose.

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support