Zyphra
/

Zamba2-2.7B-instruct

Text Generation

Inference Endpoints

Model card Files Files and versions Community

qanthony-z commited on Sep 20, 2024

Commit

40c8655

·

verified ·

1 Parent(s): b74422a

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -61,7 +61,11 @@ Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely
 | StableLM-Zephyr-3B | 3B | 66.43 | 38.27 |
-Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
 Time to First Token (TTFT)             |  Output Generation
 :-------------------------:|:-------------------------:

 | StableLM-Zephyr-3B | 3B | 66.43 | 38.27 |
+Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
+<center>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/U7VD9PYLj3XcEjgV08sP5.png" width="700" alt="Zamba performance">
+</center>
 Time to First Token (TTFT)             |  Output Generation
 :-------------------------:|:-------------------------: