Can't deploy to sagemaker

#15
by philgrey - opened

I've followed Deployment guide for this model but, I've got following error

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-11-22-21-44-54-401: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

This is a GGUF model, it's not supported on SageMaker.

Please see the GPTQ or AWQ model instead - both are supported by Text Generation Inference, which should be supported on SageMaker.

Thanks for your quick response.
I wanna ask one more.
Here, what is the meaning of HUGGING_FACE_HUB_TOKEN?
If it is token for my account, I think it has no meaning.
Can you explain about this?
Thanks very much

config = {
'HF_MODEL_ID': "TheBloke/Mistral-7B-Instruct-v0.1-GGUF", # model_id from hf.co/models
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(2048), # Max length of input text
'MAX_TOTAL_TOKENS': json.dumps(4096), # Max length of the generation (including input text)
'MAX_BATCH_TOTAL_TOKENS': json.dumps(4096), # Limits the number of tokens that can be processed in parallel during the generation
'HUGGING_FACE_HUB_TOKEN': ""
}

create HuggingFaceModel with the image uri

llm_model = HuggingFaceModel(
role=role,
image_uri=llm_image,
env=config
)

While I'm able to deploy mistralai/Mistral-7B-Instruct-v0.1 in SageMaker, this one fails with:

RuntimeError: weight model.layers.0.self_attn.q_proj.weight does not exist

I believe the root cause is explained here: https://github.com/huggingface/text-generation-inference/issues/500

Sign up or log in to comment