VRAM need for inference

#5
by RizSoto - opened

Needs information about the number of VRAM needed for inference.

Hello,

I want to know how many VRAM is needed to do a inference with Phitral 4x2_8.

If I understand how its work, the inference have 7.81B params but only 4.46B active params (approximaly).

So the need of VRAM is arround 2x(4.46) = 8.92Go of VRAM for a FP16 inference ?

I'd be interested to get the figures too! You can see the VRAM usage in Colab using the inference notebook provided in the readme.

Sign up or log in to comment