Deploy in Sagemaker

#42
by XuanNg - opened

I got this error message when deploying the model in Sagemaker using HuggingFaceModel() with the image 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.4.0-gpu-py310-cu121-ubuntu20.04.

ValueError: Unsupported model type gemma

I saw the message and want to know the plan/ETA.
"google/gemma-7b is not yet available for Amazon SageMaker deployments.
We are working on adding support."

Yes we are working on releasing the new TGI version 1.4.2, which will enable support.

Google org

@philschmid please keep us posted if you make any headway here!

@suryabhupa you should be able to deploy Gemma now, check the latest code snippet.

@philschmid could you please share your deploy configs:
I used the configs below
TGI version: 1.4.2
INSTANCE_TYPE: ml.g5.2xlarge
"HF_MODEL_ID": "google/gemma-7b-it", # model_id from hf.co/models
"SM_NUM_GPUS": json.dumps(1), # Number of GPU used per replica
"HUGGING_FACE_HUB_TOKEN": os.getenv("HUGGING_FACE_HUB_TOKEN", ""), # Hugging Face token to access private models

Error:
#033[2mtext_generation_client#033[0m#033[2m:#033[0m #033[2mrouter/client/src/lib.rs#033[0m#033[2m:#033[0m#033[2m33:#033[0m Server error: Not enough memory to handle 4096 prefill tokens. You need to decrease --max-batch-prefill-tokens
Error: Warmup(Generation("Not enough memory to handle 4096 prefill tokens. You need to decrease --max-batch-prefill-tokens"))
#033[2m2024-03-04T11:39:32.407327Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Webserver Crashed

We update the script to use a bigger instance. Alternatively you can decrease the configurations on TGI.

Sign up or log in to comment