VRAM Usage

by Achtung - opened

Hello !

I'm working on a Nvidia V100S with 32GB of VRAM but the model takes around 36GB of memory (it used this much memory on Google Collab A100), which is too much to be able to load (almost as much as bofenghuang's model). Do you have any ideas why ?

I'm working with vLLM with GPTQ support through autoGPTQ and accelerate.
