Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,13 @@ This is NanoLM-70M-Instruct-v1. The model currently supports **English only**.
|
|
19 |
|
20 |
## Model Details
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
The tokenizer and model architecture of NanoLM-70M-Instruct-v1 are the same as [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M), but the number of layers has been reduced from 30 to 12.
|
23 |
|
24 |
Essentially, it is a pure LLaMA architecture, specifically LlamaForCausalLM.
|
|
|
19 |
|
20 |
## Model Details
|
21 |
|
22 |
+
| Nano LMs | Non-emb Params | Arch | Layers | Dim | Heads | Seq Len |
|
23 |
+
| :----------: | :------------------: | :---: | :----: | :-------: | :---: | :---: |
|
24 |
+
| 25M | 15M | MistralForCausalLM | 12 | 312 | 12 |2K|
|
25 |
+
| **70M** | **42M** | **LlamaForCausalLM** | **12** | **576** | **9** | **2K** |
|
26 |
+
| 0.3B | 180M | Qwen2ForCausalLM | 12 | 896 | 14 |4K|
|
27 |
+
| 1B | 840M | Qwen2ForCausalLM | 18 | 1536 | 12 |4K|
|
28 |
+
|
29 |
The tokenizer and model architecture of NanoLM-70M-Instruct-v1 are the same as [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M), but the number of layers has been reduced from 30 to 12.
|
30 |
|
31 |
Essentially, it is a pure LLaMA architecture, specifically LlamaForCausalLM.
|