What would be the minimal Sagemaker instance to deploy this model ?

by CarlosAndrea - opened Dec 18, 2023

Discussion

CarlosAndrea

Dec 18, 2023

as stated in the title, What would be the minimal Sagemaker instance to deploy this model ?

seabasshn

Dec 18, 2023

I'm trying it with ml.g5.24xlarge but so far I haven't been able to deploy it. I keep running into this error

"Error: ShardCannotStart"
"TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'"
"TypeError: unsupported operand type(s) for /: 'NoneType' and 'int' #033[2m#033[3mrank#033[0m#033[2m=#033[0m5#033[0m"

nielsr

Jan 31

Hi,

The model is about 24GB, but with the additional data that is being sent through it, you probably would need at least 30 GB of RAM to be safe (it's a rough guess).

People have been deploying it successfully on SageMaker with 2 A10 GPUs: https://github.com/vllm-project/vllm/issues/2395. Each A10 GPU has 24GB of RAM so you'll have 48GB in total which is enough. Alternatively, 2 L4 GPUs should work as well, which also have 24GB RAM each.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment