Running Bloom-7B1 on an 8gb GPU?

#88

by CedricDB - opened Aug 18, 2022

Aug 18, 2022

Is there a way to do it using the system RAM and loading it into the GPU in chunks? I'm using the RTX2080 Max-Q (8gb VRAM) and have 64gb of ram.

justheuristic

BigScience Workshop org Aug 19, 2022

•

edited Aug 19, 2022

You might get away with running that in 8-bit precision.
Please refer to the transformers+bitsandbytes integration here https://github.com/huggingface/transformers/pull/17901
Specifically, this notebook: https://colab.research.google.com/drive/1qOjXfQIAULfKvZqwCen8-MoWKGdSatZ4#scrollTo=W8tQtyjp75O

If not, the next best version is to run parts of the model (embeddings/logits and maybe a fraction of model layers) on CPU.
We will hopefully soon release code for running even larger models (e.g. bloom-176B) on low-memory GPUs.

CedricDB

Aug 20, 2022

Thanks. I doubt that will work since 7B1 uses about 30gb of RAM and it "could reduce the size of the large models by up to 2" which wouldn't make it fit 8gb VRAM, but I'll try it.

Side question: if this lower precision is a good way to reduce memory usage without losing too much precision, why wasn't Bloom trained as a 300B parameter model in 8-bit precision for example?

boi-doingthings

Aug 20, 2022

•

edited Aug 20, 2022

Too be honest, it is hard to run these big models in that amount of VRAM. I have a RTX3070 and similar suffering. :(

You don't get to use all the 8 GB, display and other internal resources also utilize a part of the memory.

gameveloster

Jan 13, 2023

You might get away with running that in 8-bit precision.
Please refer to the transformers+bitsandbytes integration here https://github.com/huggingface/transformers/pull/17901
Specifically, this notebook: https://colab.research.google.com/drive/1qOjXfQIAULfKvZqwCen8-MoWKGdSatZ4#scrollTo=W8tQtyjp75O

If not, the next best version is to run parts of the model (embeddings/logits and maybe a fraction of model layers) on CPU.
We will hopefully soon release code for running even larger models (e.g. bloom-176B) on low-memory GPUs.

Any updates regarding running bloom-176B on <=24G GPUs?

ybelkada

BigScience Workshop org Jan 15, 2023

In this case you may want to be interested in PETALS: https://github.com/bigscience-workshop/petals
cc @borzunov

TimeRobber changed discussion status to closed Jan 27, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment