how fast does a 4-bit model speed up

#1
by saber132 - opened

Under vLLM acceleration, how much faster is a 4-bit quantized model compared to a 16-bit one?

Sign up or log in to comment