VRAM consumption when using GPU (CUDA)

#37
by Sunjay353 - opened

I noticed that the VRAM usage increases by around the model size when loading the model, which is expected. However, it then increases again by roughly twice the model size during inference. This means the VRAM consumption is approximately three times the model size overall. Furthermore, this additional utilization is not released after inference, only at model unload. Is this normal and expected behavior?

Sign up or log in to comment