Sagemaker based deployment will fail with the following error:

Will error out with:

---------------------------------------------------------------------------
ModelError                                Traceback (most recent call last)
<ipython-input-4-e72b8b1a6621> in <module>
     25 
     26 predictor.predict({
---> 27         'inputs': "Can you please let us know more details about your "
     28 })

/opt/conda/lib/python3.7/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant, inference_id)
    159             data, initial_args, target_model, target_variant, inference_id
    160         )
--> 161         response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    162         return self._handle_response(response)
    163 

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    528                 )
    529             # The "self" in this scope is referring to the BaseClient.
--> 530             return self._make_api_call(operation_name, kwargs)
    531 
    532         _api_call.__name__ = str(py_operation_name)

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    958             error_code = parsed_response.get("Error", {}).get("Code")
    959             error_class = self.exceptions.from_code(error_code)
--> 960             raise error_class(parsed_response, operation_name)
    961         else:
    962             return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "Could not load model /.sagemaker/mms/models/cerebras__Cerebras-GPT-13B with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForCausalLM\u0027\u003e, \u003cclass \u0027transformers.models.gpt2.modeling_gpt2.GPT2Model\u0027\u003e)."
}
". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/huggingface-pytorch-inference-2023-04-16-04-42-22-119 in account XXX for more information.



Input:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()

models

hub = {
'HF_MODEL_ID':'cerebras/Cerebras-GPT-13B',
'HF_TASK':'text-generation'
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
transformers_version='4.17.0',
pytorch_version='1.10.2',
py_version='py38',
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
'inputs': "Can you please let us know more details about your "
})
```

cerebras
/

Cerebras-GPT-13B

Deployment instructions for sagemaker do not work

Hub Model configuration. https://huggingface.co/models

create Hugging Face Model Class

deploy model to SageMaker Inference