Mxode commited on
Commit
192ae32
·
verified ·
1 Parent(s): 4d6a1a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -19,6 +19,13 @@ This is NanoLM-70M-Instruct-v1. The model currently supports **English only**.
19
 
20
  ## Model Details
21
 
 
 
 
 
 
 
 
22
  The tokenizer and model architecture of NanoLM-70M-Instruct-v1 are the same as [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M), but the number of layers has been reduced from 30 to 12.
23
 
24
  Essentially, it is a pure LLaMA architecture, specifically LlamaForCausalLM.
 
19
 
20
  ## Model Details
21
 
22
+ | Nano LMs | Non-emb Params | Arch | Layers | Dim | Heads | Seq Len |
23
+ | :----------: | :------------------: | :---: | :----: | :-------: | :---: | :---: |
24
+ | 25M | 15M | MistralForCausalLM | 12 | 312 | 12 |2K|
25
+ | **70M** | **42M** | **LlamaForCausalLM** | **12** | **576** | **9** | **2K** |
26
+ | 0.3B | 180M | Qwen2ForCausalLM | 12 | 896 | 14 |4K|
27
+ | 1B | 840M | Qwen2ForCausalLM | 18 | 1536 | 12 |4K|
28
+
29
  The tokenizer and model architecture of NanoLM-70M-Instruct-v1 are the same as [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M), but the number of layers has been reduced from 30 to 12.
30
 
31
  Essentially, it is a pure LLaMA architecture, specifically LlamaForCausalLM.