Mxode
/

NanoLM-70M-Instruct-v1

Text2Text Generation

Model card Files Files and versions Community

Mxode commited on Sep 7, 2024

Commit

192ae32

·

verified ·

1 Parent(s): 4d6a1a5

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -19,6 +19,13 @@ This is NanoLM-70M-Instruct-v1. The model currently supports **English only**.
 ## Model Details
 The tokenizer and model architecture of NanoLM-70M-Instruct-v1 are the same as [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M), but the number of layers has been reduced from 30 to 12.
 Essentially, it is a pure LLaMA architecture, specifically LlamaForCausalLM.

 ## Model Details
+| Nano LMs | Non-emb Params | Arch | Layers | Dim | Heads | Seq Len |
+| :----------: | :------------------: | :---: | :----: | :-------: | :---: | :---: |
+| 25M         | 15M  |   MistralForCausalLM     | 12      | 312     | 12    |2K|
+| **70M**         | **42M** |  **LlamaForCausalLM**          | **12**     | **576**    | **9**   | **2K** |
+| 0.3B         | 180M |  Qwen2ForCausalLM  | 12   | 896    | 14 |4K|
+| 1B     | 840M | Qwen2ForCausalLM | 18   | 1536   | 12   |4K|
 The tokenizer and model architecture of NanoLM-70M-Instruct-v1 are the same as [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M), but the number of layers has been reduced from 30 to 12.
 Essentially, it is a pure LLaMA architecture, specifically LlamaForCausalLM.