4bit quantisation does not reduce vram usage.
#2
by
fu-man
- opened
Hi,
Technically this model should utilize about 4gb vram. However, it consumes nearly 16gb when I use vllm.
Just wondering if I am missing something?
Regards,
Fu
The weights are consuming ~5GB of RAM. The remaining RAM usage is pre-allocated memory for KV caches, which are managed internally by vLLM.
robertgshaw2
changed discussion status to
closed