How much GPU memory do you need to serve the model?

#1
by dcastm - opened

Hi,

Thank you for making this available. I'm interested in setting up my own server to serve the model, and was wondering much memory I'd require.

Based on your experience, what's the least amount of memory required to serve the model?

Thank you!

BERTIN Project org

Hi,

This demo runs the half-precision (float16) version of the model on a 16GB GPU provided by Huggingface. It needs at least the same amount of RAM. In my experiments with a local GPU, I noticed that it sometimes spikes well above 16GB and then it comes back to around 13,5GB of VRAM. To be on the safe side, I'd use a 20GB GPU if possible with at least another 20GB of RAM available. The regular RAM use is around 3,5GB (with resident memory around 40GB for some reason).

The full-precision model (float32) is more demanding though, as it basically doubles the requirements.

We are also working on updating the int8 version that would make it even possible to load in a Colab GPU, but that's ongoing work at the moment.

Please, let us know if you have any problem setting it up.
Cheers.

BERTIN Project org

If you have problems fitting the model in memory check it out: https://huggingface.co/blog/hf-bitsandbytes-integration#is-it-faster-than-native-models?

BERTIN Project org

The next snippet should work as long as there is enough RAM:

!pip install --quiet -i https://test.pypi.org/simple/ bitsandbytes
!pip install --quiet git+https://github.com/huggingface/transformers.git  # Install latest version of transformers
!pip install --quiet accelerate

from transformers import pipeline

name = "bertin-project/bertin-gpt-j-6B"
text = "Hola, mi nombre es"
max_new_tokens = 20

pipe = pipeline(model=name, model_kwargs={"device_map": "auto", "load_in_8bit": True}, max_new_tokens=max_new_tokens)
pipe(text)
BERTIN Project org

Did you measure the memory fingerprint and the token generation slowdown against the fp16 version?

BERTIN Project org

I haven't yet :(

Sorry, for the late reply! Thanks a lot for your help.

versae changed discussion status to closed

Sign up or log in to comment