failed to start shard 0

#1
by wzbbso - opened

Hi,

deploy via sagemaker failes with: Shard 0 failed to start: Error: ShardCannotStart

Thanks, the team is looking into this and will post an update soon.
( @pranavthombare @daniel-ibanez-merlyn )

Hi @wzbbso ,

Could you share how you are deploying this to sagemaker? Are you using HuggingFaceModel?

Thanks

Hi,

Using the code provided by the deploy-->sagemaker.

Merlyn Mind org

Hey @wzbbso ,

Glad to see your interest in our model. If you could share what instance you're trying to launch it on, that would help us look into it.

Thanks.

Hi,

Used ml.g4dn.2xlarge to spin up the instance.

thanks,
mario

Hi Mario,

We are looking into this and will reply soon.

Thanks,
Daniel

Hi Mario,

I tried using a bigger instance (ml.g5.12xlarge) with 4 GPUs and it worked for me.

This is the exact code I used:

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'MerlynMind/merlyn-education-teacher-assistant',
    'SM_NUM_GPUS': json.dumps(4),
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface",version="0.8.2"),
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.12xlarge",
    container_startup_health_check_timeout=300,
  )

Sign up or log in to comment