Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

Running Bloom-7B1 on an 8gb GPU?

#88
by CedricDB - opened

Is there a way to do it using the system RAM and loading it into the GPU in chunks? I'm using the RTX2080 Max-Q (8gb VRAM) and have 64gb of ram.

image.png

BigScience Workshop org
edited Aug 19, 2022

You might get away with running that in 8-bit precision.
Please refer to the transformers+bitsandbytes integration here https://github.com/huggingface/transformers/pull/17901
Specifically, this notebook: https://colab.research.google.com/drive/1qOjXfQIAULfKvZqwCen8-MoWKGdSatZ4#scrollTo=W8tQtyjp75O

If not, the next best version is to run parts of the model (embeddings/logits and maybe a fraction of model layers) on CPU.
We will hopefully soon release code for running even larger models (e.g. bloom-176B) on low-memory GPUs.

Thanks. I doubt that will work since 7B1 uses about 30gb of RAM and it "could reduce the size of the large models by up to 2" which wouldn't make it fit 8gb VRAM, but I'll try it.

Side question: if this lower precision is a good way to reduce memory usage without losing too much precision, why wasn't Bloom trained as a 300B parameter model in 8-bit precision for example?

Too be honest, it is hard to run these big models in that amount of VRAM. I have a RTX3070 and similar suffering. :(

You don't get to use all the 8 GB, display and other internal resources also utilize a part of the memory.

You might get away with running that in 8-bit precision.
Please refer to the transformers+bitsandbytes integration here https://github.com/huggingface/transformers/pull/17901
Specifically, this notebook: https://colab.research.google.com/drive/1qOjXfQIAULfKvZqwCen8-MoWKGdSatZ4#scrollTo=W8tQtyjp75O

If not, the next best version is to run parts of the model (embeddings/logits and maybe a fraction of model layers) on CPU.
We will hopefully soon release code for running even larger models (e.g. bloom-176B) on low-memory GPUs.

Any updates regarding running bloom-176B on <=24G GPUs?

BigScience Workshop org

In this case you may want to be interested in PETALS: https://github.com/bigscience-workshop/petals
cc @borzunov

TimeRobber changed discussion status to closed

Sign up or log in to comment