Deployment to AWS SageMaker Real-time inference

#20
by LukeHero - opened

Has anyone done it successfully via real-time inference?

I have the model deployed with the repo as a model.tar.gz file in S3 but all my test inferences sending JSON data in SageMaker Studio fail.
I am seeing the following errors:

With PyTorch 1.12 the following:
"metadata.json file not found in artifacts" as the repo does not have this file.
"AttributeError: type object 'Callable' has no attribute '_abc_registry'"

With PyTorch 1.8 no errors nothing happens.

There is no option for 1.9 as per what is used in the repo.

Also tried deploying via the HF Hub but sentence-similarity is not supported as a HF_TASK.

If anyone has got sentence similarity working via real-time inference on SageMaker I would greatly appreciate some info on how you did it!

Make any progress on this?

I was able to host the model on Sagemaker by using the Hugging Face Inference Toolkit and by overriding the default methods of the HuggingFaceHandlerService.

I was able to host the model on Sagemaker by using the Hugging Face Inference Toolkit and by overriding the default methods of the HuggingFaceHandlerService.

Mind sharing some detail on how you got it working?

2023-11-30T07:09:08,403 [INFO ] W-hkunlp__instructor-xl-1-stdout

PredictionException: "Unknown task sentence-similarity,

available tasks are

['audio-classification', 'automatic-speech-recognition', 'conversational', 'depth-estimation', 'document-question-answering', 'feature-extraction', 'fill-mask', 'image-classification', 'image-segmentation', 'image-to-text', 'ner', 'object-detection', 'question-answering', 'sentiment-analysis', 'summarization', 'table-question-answering', 'text-classification', 'text-generation', 'text2text-generation', 'token-classification', 'translation', 'video-classification', 'visual-question-answering', 'vqa', 'zero-shot-classification', 'zero-shot-image-classification', 'zero-shot-object-detection', 'translation_XX_to_YY']" : 400

Sign up or log in to comment