Deploying a fine-tuned model with custom inference code

#53
by maz-qualtrics - opened

I tried to deploy a fine-tuned mistral-7B-instruct-v0.3 to sagemaker with my custom inference code. The deployment is going through without any error or warning, but it completely ignores my inference code! I don't see anything in the Cloudwatch log installing packages from "requirements.txt" and loading the model from model_fn!
I can send a regular prompt to the endpoint and get the response, but nothing from my inference code. I checked the structure of my model.tar.gz and the code folder is there and has the entry_point and requirements.txt.

What could be the issue here? Am I missing something?

import json
from sagemaker.huggingface import HuggingFaceModel
 
# s3 path where the model will be uploaded
# if you try to deploy the model to a different time add the s3 path here
model_s3_path = "s3://my-bucket/mistral-fine-tuned-custom-2024-06-28/model.tar.gz"
 
image_uri = "huggingface-pytorch-tgi-inference:2.3.0-tgi2.0.3-gpu-py310-cu121-ubuntu22.04-v2.0"

# sagemaker config
instance_type = "ml.g5.24xlarge"
number_of_gpu = 4
health_check_timeout = 900
 
# Define Model and Endpoint configuration parameter
config = {
  'HF_MODEL_ID': "/opt/ml/model", # path to where sagemaker stores the model
  'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(24000), # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(30000), # Max length of the generation (including input text)
  'MAX_BATCH_TOTAL_TOKENS': json.dumps(30001),
  'MAX_BATCH_PREFILL_TOKENS': json.dumps(30000)
}
 
# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
  role=role,
  image_uri=image_uri,
  model_data=model_s3_path, 
  entry_point="finetuned_model_entrypoint.py",
  env=config
)

llm = llm_model.deploy(
  initial_instance_count=1,
  instance_type=instance_type,
  container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
  endpoint_name="mistral-fine-tuned-stage"
)

@philschmid Appreciate your help here.

If you want to use a requirements.txt or inference.py you need to use the regular container and not TGI.

oh I see. I suspected this is not supported but since I saw the entry_point parameter in HuggingFaceModel (https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/huggingface/model.py#L119) , I thought this should be doable. Thanks a lot for your prompt reply.

I actually tried to deploy this as a PyTorchModel with a regular container, but there I got the model load failure without more context in the logs. I changed the version of torch and transformers, but didn't work. I thought I could deploy with TGI. Appreciate if you refer me to any example for the same case.

Sign up or log in to comment