nvidia
/

Hymba-1.5B-Instruct

Text Generation

Model card Files Files and versions Community

SimonX commited on Dec 11, 2024

Commit

e9aa8ad

·

verified ·

1 Parent(s): c02a352

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -38,6 +38,9 @@ This model is released under the [NVIDIA Open Model License Agreement](https://d
 ## Model Architecture
 Hymba-1.5B-Instruct has a model embedding size of 1600, 25 attention heads, and an MLP intermediate dimension of 5504, with 32 layers in total, 16 SSM states, 3 full attention layers, the rest are sliding window attention. Unlike the standard Transformer, each attention layer in Hymba has a hybrid combination of standard attention heads and Mamba heads in parallel.  Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
 Features of this architecture:

 ## Model Architecture
+> ⚡️ We've released a minimal implementation of Hymba on GitHub to help developers understand and implement its design principles in their own models. Check it out! [barebones-hymba](https://github.com/NVlabs/hymba/tree/main/barebones_hymba).
+>
 Hymba-1.5B-Instruct has a model embedding size of 1600, 25 attention heads, and an MLP intermediate dimension of 5504, with 32 layers in total, 16 SSM states, 3 full attention layers, the rest are sliding window attention. Unlike the standard Transformer, each attention layer in Hymba has a hybrid combination of standard attention heads and Mamba heads in parallel.  Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
 Features of this architecture: