SageMaker Endpoint error during inference

#16

by Shridharalve - opened Jun 1, 2023

Jun 1, 2023

Tried model inference using ml.g5.24xlarge SageMaker endpoint. Getting the below mentioned error

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "[Errno 28] No space left on device"
}```

FalconLLM

Technology Innovation Institute org Jun 1, 2023

Looks to be an issue with too little memory on the instance. What size EBS volume did you attach?

Shridharalve

Jun 1, 2023

As per the AWS Docs, the EBS volume size is 3800 GB.

Type            | CPU    |     Memory        | GPUS | GPU Memory  | Storage
ml.g5.24xlarge	|   96   |     384  GB	     |  4	 |    96 GB   |1x3800

Here I also tried with a larger instance type ml.g5.48xlarge which has double the specs

Shridharalve

Jun 1, 2023

Adding more details about the error on sagemaker endpoint.

Caused by: java.io.IOException: No space left on device

pool-2-thread-6 ERROR An exception occurred processing Appender access_log org.apache.logging.log4j.core.appender.AppenderLoggingException: Error writing to stream logs/access_log.log
2023-06-01 10:33:43,006 pool-2-thread-6 ERROR An exception occurred processing Appender access_log org.apache.logging.log4j.core.appender.AppenderLoggingException: Error writing to stream logs/access_log.log

FalconLLM

Technology Innovation Institute org Jun 1, 2023

•

edited Jun 1, 2023

The volume you mentioned is typically mounted to /tmp/ while there is a separate volume mounted to /opt/ml/checkpoints which you specify when launching the instance. I believe what is happening is that the model is downloaded under /opt/ml/checkpoints which then get's exhausted. Assuming you use the HF estimator, could you try specifying volume_size = 200?

Shridharalve

Jun 1, 2023

These instance types ml.g5.24xlarge and ml.g5.48xlarge do not support the volume_size parameter as they have a 3800 GB volume with the inference endpoint. If instance type is an issue can you suggest an appropriate one which can run the model without issues. I was able to run falcon-7B-instruct without any issues...

FalconLLM

Technology Innovation Institute org Jun 1, 2023

At this point I unfortunately do not understand sagemaker endpoints with huggingface models well enough to be able to assist you, the issue is definitely related to the disk space though, as the error indicates[Errno 28] No space left on device". The 7B might work because it fits in the standard 30GB EBS volume. Over the coming weeks we hope to be able to provide easier ways to deploy the models.

mariolr

Jun 1, 2023

The transformers library downloads the model on the default cache location: ~/.cache/huggingface/hub
However, the EBS volume is mounted on /home/ec2-user/SageMaker
You can check by running df on a terminal.

You can change the transformers cache location to a directory by running this before importing the transformers library:

import os
os.environ['TRANSFORMERS_CACHE'] = '/home/ec2-user/SageMaker/transformers-cache/'

Here's the relevant documentation: https://huggingface.co/docs/transformers/v4.29.1/en/installation#cache-setup

Shridharalve

Jun 2, 2023

•

edited Jun 2, 2023

@FalconLLM

I am no longer facing the "storage space" issue. Seems it got resolved using the below snipped in inference.py

model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b-instruct", trust_remote_code=True,load_in_8bit=False, torch_dtype=torch.bfloat16, device_map="auto", cache_dir="/tmp/model_cache/")

The model got deployed on the SG endpoint. However when I look at the instance metrics, the GPU Memory usage balloons to 788% and all the 8 GPUs are utilized on ml.g5.48xlarge (8 NVIDIA A10G GPUs and 192 GiB GPU memory) . We do not get an inference output as often it times out if there's no response for a minute. Is this machine enough for the model inference hosting. Should we wait longer for response.
Any other ml instance type I can try?

austinmw

Jun 8, 2023

ml.g5.12xlarge instance was enough for me to deploy Falcon-40B in the HF TGI DLC

FalconLLM

Technology Innovation Institute org Jun 9, 2023

For running on SageMaker we would recommend having a look at this blogpost: https://www.philschmid.de/sagemaker-falcon-llm

Shridharalve

Jun 10, 2023

@FalconLLM Yes I looked at that and we have already deployed your model.. Thanks for the help. I have posted this link for others as well in this community

Shridharalve changed discussion status to closed Jun 10, 2023

rikirolly

Jun 27, 2023

@austinmw Could you provide a shell script for the deployment in the HF TGI DLC?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment