UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-07-12-08-18-29-406

#61
by pranavnerurkar - opened

Followed
https://www.philschmid.de/sagemaker-falcon-llm

Crashed at

Deploy model to an endpoint

https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model.deploy

llm = llm_model.deploy(
initial_instance_count=1,
instance_type=instance_type,

volume_size=400, # If using an instance with local SSD storage, volume_size must be None, e.g. p4 but not p3

container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)

I used instance: ml.g4dn.xlarge


UnexpectedStatusException Traceback (most recent call last)
in
5 instance_type=instance_type,
6 # volume_size=400, # If using an instance with local SSD storage, volume_size must be None, e.g. p4 but not p3
----> 7 container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
8 )

/opt/conda/lib/python3.7/site-packages/sagemaker/huggingface/model.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, volume_size, model_data_download_timeout, container_startup_health_check_timeout, inference_recommendation_id, explainer_config, **kwargs)
326 container_startup_health_check_timeout=container_startup_health_check_timeout,
327 inference_recommendation_id=inference_recommendation_id,
--> 328 explainer_config=explainer_config,
329 )
330

/opt/conda/lib/python3.7/site-packages/sagemaker/model.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, volume_size, model_data_download_timeout, container_startup_health_check_timeout, inference_recommendation_id, explainer_config, **kwargs)
1334 data_capture_config_dict=data_capture_config_dict,
1335 explainer_config_dict=explainer_config_dict,
-> 1336 async_inference_config_dict=async_inference_config_dict,
1337 )
1338

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in endpoint_from_production_variants(self, name, production_variants, tags, kms_key, wait, data_capture_config_dict, async_inference_config_dict, explainer_config_dict)
4575 self.sagemaker_client.create_endpoint_config(**config_options)
4576
-> 4577 return self.create_endpoint(endpoint_name=name, config_name=name, tags=tags, wait=wait)
4578
4579 def expand_role(self, role):

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in create_endpoint(self, endpoint_name, config_name, tags, wait)
3968 )
3969 if wait:
-> 3970 self.wait_for_endpoint(endpoint_name)
3971 return endpoint_name
3972

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in wait_for_endpoint(self, endpoint, poll)
4323 message=message,
4324 allowed_statuses=["InService"],
-> 4325 actual_status=status,
4326 )
4327 return desc

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-07-12-08-18-29-406: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

The endpoint container is not healthy and is restarting. Check the endpoint cloudwatch logs for details.

I am tryning to deploy falcon-40b and experiencing the same error since I moved to

# install supported sagemaker SDK
!pip install "sagemaker==2.175.0" --upgrade --quiet

and

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  version="0.9.3"
)

all working fine with llm_image version "0.8.2"

both tests done with

import json
from sagemaker.huggingface import HuggingFaceModel

# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 300

# TGI config
config = {
  'HF_MODEL_ID': "tiiuae/falcon-40b-instruct", # model_id from hf.co/models
  'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(1024),  # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(2048),  # Max length of the generation (including input text)
  # 'HF_MODEL_QUANTIZE': "bitsandbytes", # comment in to quantize
}

# create HuggingFaceModel
llm_model = HuggingFaceModel(
  role=role,
  image_uri=llm_image,
  env=config
)

with falcon-7b I am able to succesfully deploy using version "0.9.3"

details of my error

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-08-08-09-21-35-398: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

What is the error you see in cloudwatch?

I have also the full CSV but this is a screenshot which looks like some relevant part.
Screenshot 2023-08-08 at 18.35.34.png

Sign up or log in to comment