Deploy with SageMaker

#15
by Larissa-Stallion - opened

When following the instructions under Deploy --> Amazon SageMaker --> SageMaker SDK --> deploy.py

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

Hub Model configuration. https://huggingface.co/models

hub = {
'HF_MODEL_ID':'Snowflake/snowflake-arctic-embed-m-long'
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface-tei",version="1.2.3"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
)

send request

predictor.predict({
"inputs": "My name is Clara and I am",
})
I receive the error:

UnexpectedStatusException: Error hosting endpoint tei-2024-07-10-22-05-53-662: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.. Try changing the instance type or reference the troubleshooting page https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-troubleshooting.html

Within the CloudWatch logs I found the error:
Error: Model backend is not healthy
Caused by:
unexpected rank, expected: 2, got: 1 ([768])

I was able to successfully create a SageMaker Endpoint for Snowflake/snowflake-arctic-embed-l, but require this long-context variant. Please let me know how to overcome this error.

Snowflake org

I installed the model locally and modified the config.json by adding "trust_remote_code": true
"torch_dtype": "float32",
"transformers_version": "4.36.1",
"trust_remote_code": true,
"type_vocab_size": 2,
"use_cache": true,
"use_flash_attn": true,
"use_rms_norm": false,
"use_xentropy": true,
"vocab_size": 30528
}
I than compressed it into a tar.gz following the instructions here: https://huggingface.co/docs/sagemaker/inference#create-a-model-artifact-for-deployment

I was able to create the SageMaker endpoint:

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

trust_remote_code = True

hub = {
'HF_MODEL_ID':'Snowflake/snowflake-arctic-embed-m-long',
'HF_TASK':'feature-extraction',
'HF_MODEL_TRUST_REMOTE_CODE': json.dumps(trust_remote_code)
}

huggingface_model = HuggingFaceModel(
model_data="s3://sagemaker-us-gov-west-1-077510649301/huggingface-models/snowflake-arctic-embed-m-long-config-mod.tar.gz", # path to your trained SageMaker model
role=role, # IAM role with permissions to create an endpoint
transformers_version="4.26", # Transformers version used
pytorch_version="1.13", # PyTorch version used
py_version='py39', # Python version used
env=hub,
)

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
endpoint_name="snowflake-arctic-embed-m-long",
)

However, I get a trust remote code error when trying to use the endpoint:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Loading /.sagemaker/mms/models/Snowflake__snowflake-arctic-embed-m-long requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code\u003dTrue to remove this error."
}
". See https://us-gov-west-1.console.aws.amazon.com/cloudwatch/home?region=us-gov-west-1#logEventViewer:group=/aws/sagemaker/Endpoints/iproposal-sandbox-embedding-snowflake-arctic-embed-m-long in account 077510649301 for more information.

Sign up or log in to comment