Not able to deploy in TGI sagemaker
Hi team,
I am trying to deploy the alfred-40b-1023 model in sagemaker endpoint with the given deploy code, but i have changed the instance type to ml.g5.48xlarge. Still it is not getting deployed.
I am getting the error: "ntk_yarn rope is not implemented" in the container logs.
Hi @hassanraha ,
Unfortunately, TGI doesn't support our custom ntk_yarn
context scaling method.
If you want to deploy this model in SageMaker and benefit from this context scaling method, you can use our custom vLLM fork.
You need to build and deploy the image to ECR following the README instructions, then deploy the model following the deployment instructions.
Thanks!
Thank you so much @cthiriet for the prompt response.
I deployed alfred40b in g5.48xlarge sagemaker instance, but i am getting OOM even for one big request. Is there any way to tweak it to run better in the smaller gpu machine?
We recommend using a p4d.24xlarge instance to run it efficiently on SageMaker.