sdobson
/

nanochat

@@ -44,7 +44,9 @@ model-index:
 **nanochat** is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models
 can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs).
-Try it out at https://huggingface.co/spaces/sdobson/nanochat
 ## Model Description
@@ -55,26 +57,6 @@ Try it out at https://huggingface.co/spaces/sdobson/nanochat
 - **License:** MIT
 - **Parameters:** 560,988,160 (~561M)
-## Inference guide
-Simon Willison created a script to allow this to run on CPU on MacOS:
-```
-  cd /tmp
-  git clone https://huggingface.co/sdobson/nanochat
-  uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
-    --model-dir /tmp/nanochat \
-    --prompt "Tell me about dogs."
-```
-Otherwise you can:
-1. Download all files
-2. Put `tokenizer.pkl` and `token_bytes.pt` in `~/.cache/nanochat/tokenizer`
-3. Put `model_000650.pt` and `meta_000650.json` in `~/.cache/nanochat/chatsft_checkpoints`
-4. Clone https://github.com/karpathy/nanochat
-5. Run `uv sync` followed by `uv run python -m scripts.chat_web`
 ### Architecture
 - **Layers:** 20
@@ -153,6 +135,26 @@ The model can be fine-tuned for specific conversational tasks or used as a base
 - **Bias:** Inherits biases from training data (FineWeb-EDU, SmolTalk, etc.)
 - **Language:** English-only
 ## Citation
 **Repository:** [github.com/karpathy/nanochat](https://github.com/karpathy/nanochat)

 **nanochat** is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models
 can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs).
+Read about the process at https://samdobson.uk/posts/training-chatgpt-for-cheap/
+Chat with the model at https://huggingface.co/spaces/sdobson/nanochat
 ## Model Description
 - **License:** MIT
 - **Parameters:** 560,988,160 (~561M)
 ### Architecture
 - **Layers:** 20
 - **Bias:** Inherits biases from training data (FineWeb-EDU, SmolTalk, etc.)
 - **Language:** English-only
+## Inference guide
+Simon Willison created a script to allow this to run on CPU on MacOS:
+```
+  cd /tmp
+  git clone https://huggingface.co/sdobson/nanochat
+  uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
+    --model-dir /tmp/nanochat \
+    --prompt "Tell me about dogs."
+```
+Otherwise you can:
+1. Download all files
+2. Put `tokenizer.pkl` and `token_bytes.pt` in `~/.cache/nanochat/tokenizer`
+3. Put `model_000650.pt` and `meta_000650.json` in `~/.cache/nanochat/chatsft_checkpoints`
+4. Clone https://github.com/karpathy/nanochat
+5. Run `uv sync` followed by `uv run python -m scripts.chat_web`
 ## Citation
 **Repository:** [github.com/karpathy/nanochat](https://github.com/karpathy/nanochat)