Text Generation
Transformers
Safetensors
English
llama
conversational
Inference Endpoints
text-generation-inference

Deploying and Inferring model to Amazon SageMaker is not working...

#32
by ellaellaellaella - opened

I deployed the model on sagemaker jupyter notebook and when I call it, I get an error.
The error log seen in cloudwatch is as follows.

[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error	
[INFO ] W-9000-upstage__SOLAR-10.7B-Inst com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):	
[INFO ] W-9000-upstage__SOLAR-10.7B-Inst ACCESS_LOG - /169.254.178.2:48252 "POST /invocations HTTP/1.1" 400 2	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 219, in handle	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.initialize(context)	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 77, in initialize	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.model = self.load(self.model_dir)	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 104, in load	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - hf_pipeline = get_pipeline(task=os.environ["HF_TASK"], model_dir=model_dir, device=self.device)	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 272, in get_pipeline	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - hf_pipeline = pipeline(task=task, model=model_dir, device=device, **kwargs)	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/__init__.py", line 675, in pipeline	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - config = AutoConfig.from_pretrained(model, _from_pipeline=task, **hub_kwargs, **model_kwargs)	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 873, in from_pretrained	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - config_class = CONFIG_MAPPING[config_dict["model_type"]]	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 579, in __getitem__	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise KeyError(key)	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - KeyError: 'llama'	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - During handling of the above exception, another exception occurred:	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/mms/service.py", line 108, in predict	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ret = self._entry_point(input_batch, self.context)	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 243, in handle	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise PredictionException(str(e), 400)	
[INFO ] W-upstage__SOLAR-10.7B-Inst-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: 'llama' : 400

The notbook code is as follows.

!pip install "sagemaker>=2.48.0" --upgrade

import sagemaker
import boto3

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

print(f"sagemaker role arn: {role}")

import json
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'upstage/SOLAR-10.7B-Instruct-v1.0',
    'SM_NUM_GPUS': json.dumps(1)
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    env=hub,
    role=role,
    transformers_version="4.26", # transformers version used
    pytorch_version="1.13", # pytorch version used
    py_version="py39", # python version of the DLC
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    container_startup_health_check_timeout=300,
)

# send request
data = {
    "inputs": "Hey my name is Julien! How are you?",
}

predictor.predict(data)

I referred to this github: https://github.com/huggingface/notebooks/blob/main/sagemaker/11_deploy_model_from_hf_hub/deploy_transformer_model_from_hf_hub.ipynb
How can I solve the problem?
Please........ Help me ....

upstage org

Hello,

This question can be better answered by AWS team. I'll forward the question to them.
Meanwhile, you can check out Solar Mini Chat available on Marketplace and JumpStart! https://aws.amazon.com/marketplace/seller-profile?id=seller-tq4lkemg5w3jw

Thanks,
Sean

Hello!

I am Jay from AWS partner manager, and I am following up this technical question with our tech team. we will give you feedback ASAP.

Thanks,

Jay

You've tried transformers version with 4.26, which doesn't support Llama backend.
(Note that Solar based on LlamaForCausalLM, So only works with transformers version which has llama backend)

Try correct version of huggingface transformers.

I recommend to try using TGI (text generation inference) or using LMI DLC (Large model inference deep learning container) which is recommended way for LLM deployment on SageMaker.

Sign up or log in to comment