How about the real token generation speed between mamba2-hybrid model vs classical Transformer based models. Transformer models can make use of vLLM to speed up. Does mamba2-hybrid models can speed up with it?
· Sign up or log in to comment