VRAM requirements

#43

by otacilio-psf - opened Sep 19, 2024

Sep 19, 2024

Hi, it's not clear to me how much VRAM I need to run this model, as it have 6.6B of active parameters it should fit in 24 GB of VRAM, or I'm wrong?

I have tried using vLLM.

Last question, is possible to change the number of experts?

Microsoft org Sep 20, 2024

Thanks for your interest!

MoE still needs to load all the parameters. So, you need memory to load 42B parameters.

It will save computation by using only 6.6B active parameters at inference.

nguyenbh changed discussion status to closed Oct 10, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment