SageMaker generation speed, timeouts

#33
by elanmarkowitz - opened

I deployed an endpoint on SageMaker using a g5.48x instance.

However, it seems much slower than other models and frequently times out.

Has anyone else seen this issue or know any ways to increase generation speed?

Deployed using this image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.1-gpu-py310-cu121-ubuntu20.04-v1.0

@elanmarkowitz What does your config look like?

Sign up or log in to comment