Deployment failing on Sagemaker

#15
by vibranium - opened

I don't know what's wrong here but the deployment is failing on sagemaker:

def deploy_mistral(
):
    hub = {
        "HF_MODEL_ID": "mistralai/Mixtral-8x7B-v0.1",
        "SM_NUM_GPUS": json.dumps(8),
        "DTYPE": "bfloat16"
    }

    hf_model = HuggingFaceModel(
        image_uri=get_huggingface_llm_image_uri("huggingface"),
        transformers_version='4.36.0',
        env=hub,
        name="mistral-model",
        role=get_iam_role(),
    )
    predictor = hf_model.deploy(
        container_startup_health_check_timeout=300,
        initial_instance_count=1,
        instance_type="ml.p4d.24xlarge",
        endpoint_name="mistral",
    )

deploy_mistral()
#033[2m2023-12-12T17:48:43.576706Z#033[0m #033[31mERROR#033[0m #033[1mshard-manager#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard complete standard error output:
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 83, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 207, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 159, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 336, in get_model
    raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type mixtral
 #033[2m#033[3mrank#033[0m#033[2m=#033[0m0#033[0m
#033[2m2023-12-12T17:48:43.674324Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 0 failed to start
#033[2m2023-12-12T17:48:43.674344Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shutting down shards
Error: ShardCannotStart

@Nondzu or anyone from the Mistral AI team can help?

@vibranium upgrade your transformers pip install --upgrade git+https://github.com/huggingface/transformers.git --no-cache

@Nondzu This is sagemaker deployment.

from sagemaker.huggingface import HuggingFaceModel
and i already have this defined in hub:

 transformers_version='4.36.0',

Did I miss something?

I face the same issue and also installed newest transformers from source. transformers commit id is f4db565b695582891e43a5e042e5d318e28f20b8
Could you provide the help?

Hey can you please take a look at https://www.philschmid.de/sagemaker-deploy-mixtral. You need the container version 1.3.1 which is not yet available in sagemaker.

@philschmid Thanks! it worked. Much appreciated for the help.

@philschmid I tried following your tutorial but I keep getting the same issue as @vibranium . Any ideas as to what the issues might be?

@seabasshn What instance size you are using? In my case it works on ml.g5.48xlarge. Also, make sure you are using below image:

763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.1-gpu-py310-cu121-ubuntu20.04-v1.0

You would also need sagemaker version: sagemaker==2.199.0

@vibranium Yes, I am using ml.g5.48xlarge and image: 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.1-gpu-py310-cu121-ubuntu20.04-v1.0

@philschmid - I have the same problem trying to deploy, exactly following the instructions in your blog post ( same image, same instance, etc ), from Sagemaker Studio (exactly as @seabasshn seems to experience too). The Cloud logs state a problem starting the Shard.
Any ideas what might be the problem??
Thanks!!!

@philschmid , @seabasshn : problem solved: I needed TGI v1.3.3
i.e. huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.3-gpu-py310-cu121-ubuntu20.04-v1.0
NOT v1.3.1 as described in the demo.
Then it worked

@philschmid , are there quantized versions of the Mixtral-8x7B-v0.1 model available yet over the Hugging Face LLM DLC?

it worked also for me with the TGI v1.3.3
Nice blog post @philschmid , very neat!

Sign up or log in to comment