Is there a best way to infer this model from multiple small memory GPUs?

#39

by hongdouzi - opened Mar 26, 2024

Mar 26, 2024

I have 4 3090s, their total memory is 96GB, which framework should I use to infer this model most efficiently?

hongdouzi changed discussion title from Is there a best way to infer this model from multiple small memory GPUs to Is there a best way to infer this model from multiple small memory GPUs? Mar 26, 2024

MrDragonFox

Mar 28, 2024

vllm/aphrodite .. load-in-4bit with 64k ctx

alexrs changed discussion status to closed Jun 12

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment