memba2-hybrid model inference speed vs Transformer-vLLM solutions

#3
by LarryLi - opened

How about the real token generation speed between mamba2-hybrid model vs classical Transformer based models. Transformer models can make use of vLLM to speed up. Does mamba2-hybrid models can speed up with it?

Sign up or log in to comment