WHY-Using vllm to run inference on openbmb/MiniCPM-V-2_6, after processing one image, the GPU is no longer utilized, but the GPU memory is still being occupied.
#15
by
fern4444
- opened
Using vllm to run inference on openbmb/MiniCPM-V-2_6, after processing one image, the GPU is no longer utilized, but the GPU memory is still being occupied.
fern4444
changed discussion title from
Using vllm to run inference on openbmb/MiniCPM-V-2_6, after processing one image, the GPU is no longer utilized, but the GPU memory is still being occupied.
to WHY-Using vllm to run inference on openbmb/MiniCPM-V-2_6, after processing one image, the GPU is no longer utilized, but the GPU memory is still being occupied.
hello ed
hai
Is your vllm closed? If vllm is not closed, the video memory will indeed be occupied all the time. vllm allocates memory in advance.
hai
{cookie}
hello ed
linglingdan
changed discussion status to
closed