Oom with 24g vram
#1
by
Klopez
- opened
Anyone else experiencing this? I have a 3090 24gb vram and i tried loading this via vllm and got oom even with max model size 1000. Is it possible to do int8 rather than fp8?
Try also setting --max-num-seqs=1
. Unfortunately the kv cache required to run this model is very large at the moment due to how vision models are profiled
Thank you for that. Seems like it helped, but wow didnt expect that to happen with such a small model. Could you link me where I can read more on this?
We have an issue tracker here https://github.com/vllm-project/vllm/issues/8826 so maybe you could leave your experience?
mgoin
changed discussion status to
closed