How much GPU memory needed?

by mazib - opened

Hello HuggingFace community,

I am trying to test the Bloom model on an AWS with 'NVIDIA A10G ' GPU which has 22GB memory.
I did run this code:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")
model = AutoModel.from_pretrained("bigscience/bloom")

It automatically downloaded the Bloom model (72 files). But after that, I do get aCUDA out of memory.

Can someone tell me how much of GPU memory is needed to run the Bloom model?


The 72 checkpoints are 329GB in total, so far inference it might take about 350GB.

BigScience Workshop org

Works in ~200GB if you use load_in_8bit feature from

@mazib you will need at least 8 x A100 80 GB GPUs for inference in fp16.
Or you can use int8 for inference.

Thanks for this discussion thread.
I have some (hopefully related) observations here:

Sign up or log in to comment