Gemma Inference Scale
#42
by
metalfusion10
- opened
Can someone give me a brief estimate, if i can perform inference on a gemma model via a locally developed chat application using the transformers library that scales across 50 ppl with 2 Nvidia 5000 ada 36GB VRam GPU's
Im trying to figure out how fast each user would get their repose back after inference, please do let me know
Hi @metalfusion10 ,
I may not be able to help you with that but take a look at the model memory requirements and some insights how they are calculated. It can you help you with your calculations.
https://huggingface.co/google/gemma-2b-it/discussions/38
Also calculator to calculate memory usages at different precisions.
Thanks