qanthony-z
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -61,7 +61,11 @@ Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely
|
|
61 |
| StableLM-Zephyr-3B | 3B | 66.43 | 38.27 |
|
62 |
|
63 |
|
64 |
-
Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer
|
|
|
|
|
|
|
|
|
65 |
|
66 |
Time to First Token (TTFT) | Output Generation
|
67 |
:-------------------------:|:-------------------------:
|
|
|
61 |
| StableLM-Zephyr-3B | 3B | 66.43 | 38.27 |
|
62 |
|
63 |
|
64 |
+
Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
|
65 |
+
|
66 |
+
<center>
|
67 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/U7VD9PYLj3XcEjgV08sP5.png" width="700" alt="Zamba performance">
|
68 |
+
</center>
|
69 |
|
70 |
Time to First Token (TTFT) | Output Generation
|
71 |
:-------------------------:|:-------------------------:
|