How do you estimate the number of GPUs required to run this model?

#29

by vishjoshi - opened Apr 17

Apr 17

The organisation I work with has HPC set up where I can ask for a number of NVIDIA A100/V100 GPUs to run inference on this model.

How many GPUs should I ask for?

I tried to run with 1 * NVIDIA A100 GPU vs 2 * NVIDIA A100 GPU but I dont see much performance increase in terms tokens per second and load time with more GPUs
Both were run with CUDA support

YaTharThShaRma999

Apr 17

The easiest way to check how much vram the model requires is by checking the file size and add 1 or 2. If the file size is 20gb then add 1 or 2 and you get 22gb.

I would recommend using the q6 one since that has no quality loss but is a decent speed. It will roughly take 40gb vram which can easily fit in an a100.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment