Not able to deploy in TGI sagemaker

#4
by hassanraha - opened

Hi team,

I am trying to deploy the alfred-40b-1023 model in sagemaker endpoint with the given deploy code, but i have changed the instance type to ml.g5.48xlarge. Still it is not getting deployed.

I am getting the error: "ntk_yarn rope is not implemented" in the container logs.

LightOn AI org

Hi @hassanraha ,

Unfortunately, TGI doesn't support our custom ntk_yarn context scaling method.

If you want to deploy this model in SageMaker and benefit from this context scaling method, you can use our custom vLLM fork.

You need to build and deploy the image to ECR following the README instructions, then deploy the model following the deployment instructions.

Thanks!

cthiriet changed discussion status to closed

Thank you so much @cthiriet for the prompt response.

I deployed alfred40b in g5.48xlarge sagemaker instance, but i am getting OOM even for one big request. Is there any way to tweak it to run better in the smaller gpu machine?

LightOn AI org

We recommend using a p4d.24xlarge instance to run it efficiently on SageMaker.

Sign up or log in to comment