Runtime issue when deploying on a SageMaker endpoint

#7
by krokoko - opened

Hi, I'm trying to deploy the model to a SageMaker endpoint using the SDK. I extended the latest available hugging face DLC to install the correct version of the transformers library (4.28.0). I'm deploying the model with:

hub = {
    'HF_MODEL_ID':'nomic-ai/gpt4all-13b-snoozy',
    'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model_snoozy = HuggingFaceModel(
    image_uri=ecr_image,
    transformers_version='4.28.0',
    pytorch_version='1.13.1',
    py_version='py39',
    env=hub,
    role=role, 
)

When running a prediction, I get:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "Could not load model /.sagemaker/mms/models/nomic-ai__gpt4all-13b-snoozy with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForCausalLM\u0027\u003e, \u003cclass \u0027transformers.models.llama.modeling_llama.LlamaForCausalLM\u0027\u003e)."
}

Any idea what could be happening ? Thanks !

Sign up or log in to comment