Update README.md
Browse files
README.md
CHANGED
@@ -38,6 +38,9 @@ This model is released under the [NVIDIA Open Model License Agreement](https://d
|
|
38 |
|
39 |
## Model Architecture
|
40 |
|
|
|
|
|
|
|
41 |
Hymba-1.5B-Instruct has a model embedding size of 1600, 25 attention heads, and an MLP intermediate dimension of 5504, with 32 layers in total, 16 SSM states, 3 full attention layers, the rest are sliding window attention. Unlike the standard Transformer, each attention layer in Hymba has a hybrid combination of standard attention heads and Mamba heads in parallel. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
|
42 |
|
43 |
Features of this architecture:
|
|
|
38 |
|
39 |
## Model Architecture
|
40 |
|
41 |
+
> ⚡️ We've released a minimal implementation of Hymba on GitHub to help developers understand and implement its design principles in their own models. Check it out! [barebones-hymba](https://github.com/NVlabs/hymba/tree/main/barebones_hymba).
|
42 |
+
>
|
43 |
+
|
44 |
Hymba-1.5B-Instruct has a model embedding size of 1600, 25 attention heads, and an MLP intermediate dimension of 5504, with 32 layers in total, 16 SSM states, 3 full attention layers, the rest are sliding window attention. Unlike the standard Transformer, each attention layer in Hymba has a hybrid combination of standard attention heads and Mamba heads in parallel. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
|
45 |
|
46 |
Features of this architecture:
|