sdobson commited on
Commit
185f046
·
verified ·
1 Parent(s): 6c51a3f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -21
README.md CHANGED
@@ -44,7 +44,9 @@ model-index:
44
  **nanochat** is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models
45
  can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs).
46
 
47
- Try it out at https://huggingface.co/spaces/sdobson/nanochat
 
 
48
 
49
  ## Model Description
50
 
@@ -55,26 +57,6 @@ Try it out at https://huggingface.co/spaces/sdobson/nanochat
55
  - **License:** MIT
56
  - **Parameters:** 560,988,160 (~561M)
57
 
58
- ## Inference guide
59
-
60
- Simon Willison created a script to allow this to run on CPU on MacOS:
61
-
62
- ```
63
- cd /tmp
64
- git clone https://huggingface.co/sdobson/nanochat
65
- uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
66
- --model-dir /tmp/nanochat \
67
- --prompt "Tell me about dogs."
68
- ```
69
-
70
- Otherwise you can:
71
-
72
- 1. Download all files
73
- 2. Put `tokenizer.pkl` and `token_bytes.pt` in `~/.cache/nanochat/tokenizer`
74
- 3. Put `model_000650.pt` and `meta_000650.json` in `~/.cache/nanochat/chatsft_checkpoints`
75
- 4. Clone https://github.com/karpathy/nanochat
76
- 5. Run `uv sync` followed by `uv run python -m scripts.chat_web`
77
-
78
  ### Architecture
79
 
80
  - **Layers:** 20
@@ -153,6 +135,26 @@ The model can be fine-tuned for specific conversational tasks or used as a base
153
  - **Bias:** Inherits biases from training data (FineWeb-EDU, SmolTalk, etc.)
154
  - **Language:** English-only
155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
  ## Citation
157
 
158
  **Repository:** [github.com/karpathy/nanochat](https://github.com/karpathy/nanochat)
 
44
  **nanochat** is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models
45
  can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs).
46
 
47
+ Read about the process at https://samdobson.uk/posts/training-chatgpt-for-cheap/
48
+
49
+ Chat with the model at https://huggingface.co/spaces/sdobson/nanochat
50
 
51
  ## Model Description
52
 
 
57
  - **License:** MIT
58
  - **Parameters:** 560,988,160 (~561M)
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ### Architecture
61
 
62
  - **Layers:** 20
 
135
  - **Bias:** Inherits biases from training data (FineWeb-EDU, SmolTalk, etc.)
136
  - **Language:** English-only
137
 
138
+ ## Inference guide
139
+
140
+ Simon Willison created a script to allow this to run on CPU on MacOS:
141
+
142
+ ```
143
+ cd /tmp
144
+ git clone https://huggingface.co/sdobson/nanochat
145
+ uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
146
+ --model-dir /tmp/nanochat \
147
+ --prompt "Tell me about dogs."
148
+ ```
149
+
150
+ Otherwise you can:
151
+
152
+ 1. Download all files
153
+ 2. Put `tokenizer.pkl` and `token_bytes.pt` in `~/.cache/nanochat/tokenizer`
154
+ 3. Put `model_000650.pt` and `meta_000650.json` in `~/.cache/nanochat/chatsft_checkpoints`
155
+ 4. Clone https://github.com/karpathy/nanochat
156
+ 5. Run `uv sync` followed by `uv run python -m scripts.chat_web`
157
+
158
  ## Citation
159
 
160
  **Repository:** [github.com/karpathy/nanochat](https://github.com/karpathy/nanochat)