Update README.md
Browse files
README.md
CHANGED
|
@@ -12,12 +12,15 @@ license: apache-2.0
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# Zebra-Llama: Towards Extremely Efficient Hybrid Models
|
| 15 |
-
Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that
|
|
|
|
| 16 |
|
| 17 |
-
This model, `Zebra-Llama-1B-4MLA-12M2`, is created by efficiently adapting the pre-trained Llama-3.2-1B-Instruct model
|
| 18 |
-
|
| 19 |
-
The composition follows a three-stage pipeline to effectively transfer knowledge from the pre-trained Transformer.
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## Key Takeaways
|
| 23 |
- Announcing Zebra-Llama, a family of highly efficient 1B, 3B, and 8B hybrid models created by post-training adaptation of existing state-of-the-art Transformers.
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# Zebra-Llama: Towards Extremely Efficient Hybrid Models
|
| 15 |
+
Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that composes Multi-head Latent Attention (MLA) and Mamba2 for KV cache compression and computational efficiency.
|
| 16 |
+
Thus combination achieves Transformer-level accuracy with near-State Space Model (SSM) efficiency. While standard Transformers are limited by the quadratic complexity of self-attention and the large memory footprint of their key-value (KV) cache, Zebra-Llama offers a practical and scalable solution.
|
| 17 |
|
| 18 |
+
This model, `Zebra-Llama-1B-4MLA-12M2`, is created by efficiently adapting the pre-trained Llama-3.2-1B-Instruct model conducted post-training on AMD Instinct™ MI300X GPUs. This training approach bypasses the need for costly pre-training from scratch.
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
<div align="center">
|
| 21 |
+
<img src="scaling_perf_instruct.png" style="object-fit: contain;"/>
|
| 22 |
+
<em><b>Figure 1:</b> Pareto frontier of pre-training tokens vs average performance for pre-trained and instruction-tuned models.</em>
|
| 23 |
+
</div>
|
| 24 |
|
| 25 |
## Key Takeaways
|
| 26 |
- Announcing Zebra-Llama, a family of highly efficient 1B, 3B, and 8B hybrid models created by post-training adaptation of existing state-of-the-art Transformers.
|