Issue with deploy on sagemaker -

#9
by scribematic - opened

Hi, I am trying to deploy on sagemaker and am running into some issues I don't get on other models

from sagemaker.huggingface import HuggingFaceModel
import boto3
from sagemaker import Session

# Replace with your access key and secret key
access_key = "key"
secret_key = "key"

# Create a boto3 session with the specified access key and secret key
boto3_session = boto3.Session(
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    region_name="us-east-1"
)

# Use the boto3 session to create the IAM client
iam_client = boto3_session.client('iam')

# Create a SageMaker session with the custom boto3 session
sagemaker_session = Session(boto_session=boto3_session)

role = iam_client.get_role(RoleName='ROLE')['Role']['Arn']
# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'TheBloke/WizardLM-7B-uncensored-GPTQ',
    'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.17.0',
    pytorch_version='1.10.2',
    py_version='py38',
    env=hub,
    role=role,
    sagemaker_session=sagemaker_session  # Pass the custom SageMaker session
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.g4dn.2xlarge' # ec2 instance type
)

I am getting the following error trying to query the endpoint after deployment:

{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"
}

Is this a library that it doesn't import? Do I need to custom set this up instead of just deploying to sagemaker? The inference huggingface export doesn't work for the same reason, probably worth bringing to your attention.

Thank you

I am getting the same error, any suggestions? Im simply using the sagemaker deployment code listed above

Cognitive Computations org
edited May 11, 2023

I'm afraid I don't know anything about sagemaker. But I'm happy to take pull requests if anyone figures out what's wrong

I figured out the error, unfortunately dont see an immediate solution to deploy this as a sagemaker endpoint. The sagemaker env only supports HF transformers versions up to 4.7 or something, and this model is a fine tuned llama model, which was done on 4.28: https://huggingface.co/decapoda-research/llama-7b-hf/discussions/39

not sure when support will be available

Sign up or log in to comment