memba2-hybrid model inference speed vs Transformer-vLLM solutions

by LarryLi - opened Jul 18

Jul 18

How about the real token generation speed between mamba2-hybrid model vs classical Transformer based models. Transformer models can make use of vLLM to speed up. Does mamba2-hybrid models can speed up with it?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment