Update README.md
Browse files
README.md
CHANGED
|
@@ -44,7 +44,9 @@ model-index:
|
|
| 44 |
**nanochat** is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models
|
| 45 |
can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs).
|
| 46 |
|
| 47 |
-
|
|
|
|
|
|
|
| 48 |
|
| 49 |
## Model Description
|
| 50 |
|
|
@@ -55,26 +57,6 @@ Try it out at https://huggingface.co/spaces/sdobson/nanochat
|
|
| 55 |
- **License:** MIT
|
| 56 |
- **Parameters:** 560,988,160 (~561M)
|
| 57 |
|
| 58 |
-
## Inference guide
|
| 59 |
-
|
| 60 |
-
Simon Willison created a script to allow this to run on CPU on MacOS:
|
| 61 |
-
|
| 62 |
-
```
|
| 63 |
-
cd /tmp
|
| 64 |
-
git clone https://huggingface.co/sdobson/nanochat
|
| 65 |
-
uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
|
| 66 |
-
--model-dir /tmp/nanochat \
|
| 67 |
-
--prompt "Tell me about dogs."
|
| 68 |
-
```
|
| 69 |
-
|
| 70 |
-
Otherwise you can:
|
| 71 |
-
|
| 72 |
-
1. Download all files
|
| 73 |
-
2. Put `tokenizer.pkl` and `token_bytes.pt` in `~/.cache/nanochat/tokenizer`
|
| 74 |
-
3. Put `model_000650.pt` and `meta_000650.json` in `~/.cache/nanochat/chatsft_checkpoints`
|
| 75 |
-
4. Clone https://github.com/karpathy/nanochat
|
| 76 |
-
5. Run `uv sync` followed by `uv run python -m scripts.chat_web`
|
| 77 |
-
|
| 78 |
### Architecture
|
| 79 |
|
| 80 |
- **Layers:** 20
|
|
@@ -153,6 +135,26 @@ The model can be fine-tuned for specific conversational tasks or used as a base
|
|
| 153 |
- **Bias:** Inherits biases from training data (FineWeb-EDU, SmolTalk, etc.)
|
| 154 |
- **Language:** English-only
|
| 155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
## Citation
|
| 157 |
|
| 158 |
**Repository:** [github.com/karpathy/nanochat](https://github.com/karpathy/nanochat)
|
|
|
|
| 44 |
**nanochat** is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models
|
| 45 |
can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs).
|
| 46 |
|
| 47 |
+
Read about the process at https://samdobson.uk/posts/training-chatgpt-for-cheap/
|
| 48 |
+
|
| 49 |
+
Chat with the model at https://huggingface.co/spaces/sdobson/nanochat
|
| 50 |
|
| 51 |
## Model Description
|
| 52 |
|
|
|
|
| 57 |
- **License:** MIT
|
| 58 |
- **Parameters:** 560,988,160 (~561M)
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
### Architecture
|
| 61 |
|
| 62 |
- **Layers:** 20
|
|
|
|
| 135 |
- **Bias:** Inherits biases from training data (FineWeb-EDU, SmolTalk, etc.)
|
| 136 |
- **Language:** English-only
|
| 137 |
|
| 138 |
+
## Inference guide
|
| 139 |
+
|
| 140 |
+
Simon Willison created a script to allow this to run on CPU on MacOS:
|
| 141 |
+
|
| 142 |
+
```
|
| 143 |
+
cd /tmp
|
| 144 |
+
git clone https://huggingface.co/sdobson/nanochat
|
| 145 |
+
uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
|
| 146 |
+
--model-dir /tmp/nanochat \
|
| 147 |
+
--prompt "Tell me about dogs."
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
+
Otherwise you can:
|
| 151 |
+
|
| 152 |
+
1. Download all files
|
| 153 |
+
2. Put `tokenizer.pkl` and `token_bytes.pt` in `~/.cache/nanochat/tokenizer`
|
| 154 |
+
3. Put `model_000650.pt` and `meta_000650.json` in `~/.cache/nanochat/chatsft_checkpoints`
|
| 155 |
+
4. Clone https://github.com/karpathy/nanochat
|
| 156 |
+
5. Run `uv sync` followed by `uv run python -m scripts.chat_web`
|
| 157 |
+
|
| 158 |
## Citation
|
| 159 |
|
| 160 |
**Repository:** [github.com/karpathy/nanochat](https://github.com/karpathy/nanochat)
|