Is there a best way to infer this model from multiple small memory GPUs?

#39
by hongdouzi - opened

I have 4 3090s, their total memory is 96GB, which framework should I use to infer this model most efficiently?

hongdouzi changed discussion title from Is there a best way to infer this model from multiple small memory GPUs to Is there a best way to infer this model from multiple small memory GPUs?

vllm/aphrodite .. load-in-4bit with 64k ctx

Sign up or log in to comment