Gemma Inference Scale

#42

by metalfusion10 - opened Apr 27, 2024

Apr 27, 2024

Can someone give me a brief estimate, if i can perform inference on a gemma model via a locally developed chat application using the transformers library that scales across 50 ppl with 2 Nvidia 5000 ada 36GB VRam GPU's

Im trying to figure out how fast each user would get their repose back after inference, please do let me know

sineeli

Google org May 10, 2024

•

edited May 10, 2024

Hi @metalfusion10 ,

I may not be able to help you with that but take a look at the model memory requirements and some insights how they are calculated. It can you help you with your calculations.

https://huggingface.co/google/gemma-2b-it/discussions/38

Also calculator to calculate memory usages at different precisions.

Thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment