Gemma Inference Scale

#42
by metalfusion10 - opened

Can someone give me a brief estimate, if i can perform inference on a gemma model via a locally developed chat application using the transformers library that scales across 50 ppl with 2 Nvidia 5000 ada 36GB VRam GPU's

Im trying to figure out how fast each user would get their repose back after inference, please do let me know

Google org
edited May 10

Hi @metalfusion10 ,

I may not be able to help you with that but take a look at the model memory requirements and some insights how they are calculated. It can you help you with your calculations.

https://huggingface.co/google/gemma-2b-it/discussions/38

Also calculator to calculate memory usages at different precisions.

Thanks

Sign up or log in to comment