Oom with 24g vram

by Klopez - opened Sep 27, 2024

Sep 27, 2024

•

edited Sep 27, 2024

Anyone else experiencing this? I have a 3090 24gb vram and i tried loading this via vllm and got oom even with max model size 1000. Is it possible to do int8 rather than fp8?

mgoin

Neural Magic org Sep 27, 2024

Try also setting --max-num-seqs=1. Unfortunately the kv cache required to run this model is very large at the moment due to how vision models are profiled

Klopez

Sep 27, 2024

Thank you for that. Seems like it helped, but wow didnt expect that to happen with such a small model. Could you link me where I can read more on this?

mgoin

Neural Magic org Sep 27, 2024

We have an issue tracker here https://github.com/vllm-project/vllm/issues/8826 so maybe you could leave your experience?

mgoin changed discussion status to closed Oct 2, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment