How to load on a single A100 40GB

#18
by mnwato - opened

Hi. Anyone knows about the memory usage? Is there a way to load on a single A100 40GB?

I got it working with bitsandbytes 4bit. Here is how I did it: https://huggingface.co/tiiuae/falcon-40b/discussions/38#6479de427c18dca75e9a0903

Please use huggingface dev version 4.30-dev (downloaded from pip github) & accelerate 0.20-dev (from github too)

Then please use bitsandbytes package for using bfloat16, load_in_4bit, and quant_type=nf4.

mnwato changed discussion status to closed

Sign up or log in to comment