NanoDiffusionGPT Tiny Shakespeare
This is a small character-level diffusion language model trained on Tiny Shakespeare.
It is not a GPT-style next-token model. It uses a bidirectional Transformer and
learns to reconstruct randomly masked characters. Generation starts with a
prompt plus [MASK] tokens, then fills the most confident masked positions over
multiple denoising steps.
Files
ckpt.pt: PyTorch checkpointmodel.py: NanoDiffusionGPT model definitionsample.py: local sampling scriptmeta.pkl: character vocabulary metadataconfig.json: architecture and training metadatarequirements.txt: minimal Python dependencies
Training
- dataset: Tiny Shakespeare, character-level
- iterations: 100,000
- parameters: 3.16M
- block size: 256
- layers: 4
- heads: 4
- embedding size: 256
- objective: masked denoising
- final checkpoint val loss: 1.9720
Usage
pip install -r requirements.txt
python sample.py --out_dir=. --device=cpu --dtype=float32 --start="To be" --max_new_tokens=180 --steps=128
On Apple Silicon:
python sample.py --out_dir=. --device=mps --dtype=float32 --compile=False --start="KING:"
Example Output
KING:
What, my lord, my my lord?
DUKE OF AAMERLLE:
What is my lord, that you you will?
DERBY:
To take your lord, your lord.
The model is intentionally tiny and educational. Expect Shakespeare-like structure, line breaks, speaker names, and invented spellings, not coherent large-model prose.
- Downloads last month
- 14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support