Update README.md
Browse files
README.md
CHANGED
@@ -62,7 +62,7 @@ datasets:
|
|
62 |
|
63 |
# smol_llama-81M-tied
|
64 |
|
65 |
-
A small
|
66 |
|
67 |
- 768 hidden size, 6 layers
|
68 |
- standard multi-head attention (24 heads), context length 1024
|
|
|
62 |
|
63 |
# smol_llama-81M-tied
|
64 |
|
65 |
+
A small 81M param (total) decoder model, enabled through tying the input/output embeddings. This is the first version of the model.
|
66 |
|
67 |
- 768 hidden size, 6 layers
|
68 |
- standard multi-head attention (24 heads), context length 1024
|