What would be the minimal Sagemaker instance to deploy this model ?

#7
by CarlosAndrea - opened

as stated in the title, What would be the minimal Sagemaker instance to deploy this model ?

I'm trying it with ml.g5.24xlarge but so far I haven't been able to deploy it. I keep running into this error

  • "Error: ShardCannotStart"
  • "TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'"
  • "TypeError: unsupported operand type(s) for /: 'NoneType' and 'int' #033[2m#033[3mrank#033[0m#033[2m=#033[0m5#033[0m"

Hi,

The model is about 24GB, but with the additional data that is being sent through it, you probably would need at least 30 GB of RAM to be safe (it's a rough guess).

People have been deploying it successfully on SageMaker with 2 A10 GPUs: https://github.com/vllm-project/vllm/issues/2395. Each A10 GPU has 24GB of RAM so you'll have 48GB in total which is enough. Alternatively, 2 L4 GPUs should work as well, which also have 24GB RAM each.

Sign up or log in to comment