Gemma Inference Scale

#42
by metalfusion10 - opened

Can someone give me a brief estimate, if i can perform inference on a gemma model via a locally developed chat application using the transformers library that scales across 50 ppl with 2 Nvidia 5000 ada 36GB VRam GPU's

Im trying to figure out how fast each user would get their repose back after inference, please do let me know

Hi @metalfusion10 ,

I may not be able to help you with that but take a look at the model memory requirements and some insights how they are calculated. It can you help you with your calculations.

https://huggingface.co/google/gemma-2b-it/discussions/38

Also calculator to calculate memory usages at different precisions.

Thanks

Sign up or log in to comment