Hardware Requirements for CPU / GPU Inference

#58
by jurassicpark - opened

I was looking and couldn't find any recommendations for the required hardware to run this model in inference on the CPU or GPU.

I'm going to test it out but some guidance would be pretty helpful.

Does anyone have this data? Particularly, how much RAM for CPU, and amount of GPU RAM (I've seen some threads saying ~352GB). Also, perhaps what kind of inference times can be expected with different setups.

Copying some data I found from other threads here:

@IanBeaver

It needed around 400GB [disk space] just to fit the all the weights files. They list the sizes of the weights and checkpoints under the Training section.

@IanBeaver

I have successfully loaded it on a single x2iezn.6xlarge instance in AWS but using only CPUs the model is very slow. Text generation sampling for several sequences can take several minutes to return, but the full model is working and it is much cheaper for local evaluation than 9 GPUs!

x2iezn.6xlarge specs:

  • 768gb RAM
  • 24 vcpus
  • $5.004 / hour

@maveriq

As a first order estimate, 176B parameters in half precision (16 bits = 2 bytes) would need 352 GB RAM. But since some modules are 32-bit, it would be more. So about nine GPUs with 40-GB RAM, and it doesn't take into account the input.

GPU RAM requires more than 352 GB RAM (176B parameters in half-precision). I can do the inference on 8 A6000 GPUs. However, there isn't much room left for input tokens.

Copying some data I found from other threads here:...

Thanks for this, very helpful, was looking for the same information. No wonder I am failing to run the full model on a 64GB VM. ;)

Have you come across any recommendations anywhere to reduce memory usage, say, for specific pipeline tasks?

@bwv988 Your best bet is to try out bitsandbytes. https://github.com/TimDettmers/bitsandbytes

Thanks @snarik , will give this a go!

Sign up or log in to comment