failed to start shard 0
Hi,
deploy via sagemaker failes with: Shard 0 failed to start: Error: ShardCannotStart
Thanks, the team is looking into this and will post an update soon.
(
@pranavthombare
@daniel-ibanez-merlyn
)
Hi @wzbbso ,
Could you share how you are deploying this to sagemaker? Are you using HuggingFaceModel?
Thanks
Hi,
Using the code provided by the deploy-->sagemaker.
Hey @wzbbso ,
Glad to see your interest in our model. If you could share what instance you're trying to launch it on, that would help us look into it.
Thanks.
Hi,
Used ml.g4dn.2xlarge to spin up the instance.
thanks,
mario
Hi Mario,
We are looking into this and will reply soon.
Thanks,
Daniel
Hi Mario,
I tried using a bigger instance (ml.g5.12xlarge) with 4 GPUs and it worked for me.
This is the exact code I used:
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'MerlynMind/merlyn-education-teacher-assistant',
'SM_NUM_GPUS': json.dumps(4),
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="0.8.2"),
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.12xlarge",
container_startup_health_check_timeout=300,
)