failed to start shard 0

by wzbbso - opened Jun 26, 2023

Discussion

wzbbso

Jun 26, 2023

Hi,

deploy via sagemaker failes with: Shard 0 failed to start: Error: ShardCannotStart

xioaditya

Merlyn Mind org Jun 27, 2023

•

edited Jun 27, 2023

Thanks, the team is looking into this and will post an update soon.
( @pranavthombare @daniel-ibanez-merlyn )

daniel-ibanez-merlyn

Merlyn Mind org Jun 27, 2023

Hi @wzbbso ,

Could you share how you are deploying this to sagemaker? Are you using HuggingFaceModel?

Thanks

wzbbso

Jun 27, 2023

Hi,

Using the code provided by the deploy-->sagemaker.

pranavthombare

Jun 27, 2023

Hey @wzbbso ,

Glad to see your interest in our model. If you could share what instance you're trying to launch it on, that would help us look into it.

Thanks.

wzbbso

Jun 27, 2023

Hi,

Used ml.g4dn.2xlarge to spin up the instance.

thanks,
mario

daniel-ibanez-merlyn

Merlyn Mind org Jun 27, 2023

Hi Mario,

We are looking into this and will reply soon.

Thanks,
Daniel

daniel-ibanez-merlyn

Merlyn Mind org Jun 27, 2023

•

edited Jun 27, 2023

Hi Mario,

I tried using a bigger instance (ml.g5.12xlarge) with 4 GPUs and it worked for me.

This is the exact code I used:

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'MerlynMind/merlyn-education-teacher-assistant',
    'SM_NUM_GPUS': json.dumps(4),
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface",version="0.8.2"),
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.12xlarge",
    container_startup_health_check_timeout=300,
  )

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment