how much Vram does it take to run Falcon 40b

#44
by Toaster496 - opened

how much Vram/Hotswap Ram does it take to run Falcon 40b any1 got idears??

? what cards? how is that Vram or RAM?

Depends on if you want to do inference in 32, 16, 8 or 4 bit, but at full 32 bit I think it's about 80GB of VRAM.
Correction: 16 bit is 80ish GB and 32 bit would be around 160ish GB I believe. Thanks Mikael110 was thinking about 16 bit and not 32 when I wrote this.

With 8bit loading it consumes ~46GB of VRAM, and with 4bit loading it takes ~24GB VRAM. Those numbers exclude OS headroom, so don't expect 4bit to fit on actual 24GB cards, and 8bit will be a tight squeeze on 48GB cards, you will probably OOM once the context gets even remotely long. I can't give numbers for 16bit and 32bit since they OOM on the A100 80GB which I was testing on. But given that even 16bit is too big for the card I'm quite confident that 32bit is quite a bit larger than 80GB. Maybe that's the number leoapolonio was referencing? I could definitively see it actually being that high for full 32bit inference.

THANK YOU!

Hi, the model is trained in bfloat16, not float32 - you need 40B x 2 byte per param = ~80Go to run it

Technology Innovation Institute org

We recommend 80-100GB to run inference on Falcon-40B comfortably.

FalconLLM changed discussion status to closed

Sign up or log in to comment