Running Bloom-7B1 on an 8gb GPU?
You might get away with running that in 8-bit precision.
Please refer to the transformers+bitsandbytes integration here https://github.com/huggingface/transformers/pull/17901
Specifically, this notebook: https://colab.research.google.com/drive/1qOjXfQIAULfKvZqwCen8-MoWKGdSatZ4#scrollTo=W8tQtyjp75O
If not, the next best version is to run parts of the model (embeddings/logits and maybe a fraction of model layers) on CPU.
We will hopefully soon release code for running even larger models (e.g. bloom-176B) on low-memory GPUs.
Thanks. I doubt that will work since 7B1 uses about 30gb of RAM and it "could reduce the size of the large models by up to 2" which wouldn't make it fit 8gb VRAM, but I'll try it.
Side question: if this lower precision is a good way to reduce memory usage without losing too much precision, why wasn't Bloom trained as a 300B parameter model in 8-bit precision for example?
Too be honest, it is hard to run these big models in that amount of VRAM. I have a RTX3070 and similar suffering. :(
You don't get to use all the 8 GB, display and other internal resources also utilize a part of the memory.
You might get away with running that in 8-bit precision.
Please refer to the transformers+bitsandbytes integration here https://github.com/huggingface/transformers/pull/17901
Specifically, this notebook: https://colab.research.google.com/drive/1qOjXfQIAULfKvZqwCen8-MoWKGdSatZ4#scrollTo=W8tQtyjp75OIf not, the next best version is to run parts of the model (embeddings/logits and maybe a fraction of model layers) on CPU.
We will hopefully soon release code for running even larger models (e.g. bloom-176B) on low-memory GPUs.
Any updates regarding running bloom-176B on <=24G GPUs?
In this case you may want to be interested in PETALS: https://github.com/bigscience-workshop/petals
cc
@borzunov