Not able to deploy in TGI sagemaker

by hassanraha - opened Mar 24

Mar 24

Hi team,

I am trying to deploy the alfred-40b-1023 model in sagemaker endpoint with the given deploy code, but i have changed the instance type to ml.g5.48xlarge. Still it is not getting deployed.

I am getting the error: "ntk_yarn rope is not implemented" in the container logs.

cthiriet

LightOn AI org Mar 26

Hi @hassanraha ,

Unfortunately, TGI doesn't support our custom ntk_yarn context scaling method.

If you want to deploy this model in SageMaker and benefit from this context scaling method, you can use our custom vLLM fork.

You need to build and deploy the image to ECR following the README instructions, then deploy the model following the deployment instructions.

Thanks!

cthiriet changed discussion status to closed Mar 26

hassanraha

Mar 26

Thank you so much @cthiriet for the prompt response.

I deployed alfred40b in g5.48xlarge sagemaker instance, but i am getting OOM even for one big request. Is there any way to tweak it to run better in the smaller gpu machine?

cthiriet

LightOn AI org Mar 28

We recommend using a p4d.24xlarge instance to run it efficiently on SageMaker.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment