llama 3 inference scale

#91

by metalfusion10 - opened Apr 27, 2024

Apr 27, 2024

Can someone give me a brief estimate, if i can perform inference on a Llama model via a locally developed chat application using the transformers library that scales across 50 ppl with 2 Nvidia 5000 ada 36GB VRam GPU's

Im trying to figure out how fast each user would get their repose back after inference, please do let me know

hanghangaidoudou

Apr 28, 2024

can it run at the double rtx 4090,just like 24+24=48?

yas2441

Jul 17, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment