Deploying the model to Amazon Sagemaker for "inference"
Hi Devs, I'm trying to deploy MiniCPM-Llama3-V-2_5 to Amazon Sagemaker to create an inference endpoint. I've tried packaging the model with an inference.py containing the model_fn function to load the model, but getting some errors.
Here's an excerpt from my sagemaker notebook based on notebooks provided by huggingface for loading models from s3:
huggingface_model = HuggingFaceModel(
model_data="s3://###/minicpm_mod.tar.gz", # path to your trained sagemaker model
transformers_version='4.37.0',
pytorch_version='2.1.0',
py_version='py310',
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.g5.12xlarge' # ec2 instance type
)
I've defined the model_fn in inference.py as
def model_fn(model_dir)
The most recent error is this:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "model_fn() takes 1 positional argument but 2 were given"
}
If I modify the inference.py to take more inputs(either by adding context, or and args+kwargs combo for inputs along with model_dir, and providing 'HF_TASK' as an environment variable, the error changes to model_fn() takes 1 or 2 positional arguments, but 3 were given).
Could you please provide some direction on how I could deploy this model on Sagemaker for inference.