# Deploy LLaVA on Amazon SageMaker

Amazon SageMaker is a popular platform for running AI models, and models on huggingface deploy [Hugging Face Transformers](https://github.com/huggingface/transformers) using [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) and the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/).

![llava](https://i.imgur.com/YNVG140.png)

Install sagemaker sdk:

In [None]:
!pip install sagemaker --upgrade
!pip install -r code/requirements.txt

Bundle llava model weights and code into a `model.tar.gz`:

In [1]:
# Create SageMaker model.tar.gz artifact
!tar -cf model.tar.gz --use-compress-program=pigz *

After we created the `model.tar.gz` archive we can upload it to Amazon S3. We will use the `sagemaker` SDK to upload the model to our sagemaker session bucket.

Initialize sagemaker session first:

In [2]:
import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
 # set to default bucket if a bucket name is not given
 sagemaker_session_bucket = sess.default_bucket()

try:
 role = sagemaker.get_execution_role()
except ValueError:
 iam = boto3.client('iam')
 # setup your own rolename in sagemaker
 role = iam.get_role(RoleName='AmazonSageMaker-ExecutionRole-20231008T201275')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml


Couldn't call 'get_role' to get Role ARN from role name arn:aws:iam::297308036828:root to get Role path.


sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml
sagemaker role arn: arn:aws:iam::297308036828:role/service-role/AmazonSageMaker-ExecutionRole-20231008T201275
sagemaker bucket: sagemaker-us-west-2-297308036828
sagemaker session region: us-west-2


Upload the `model.tar.gz` to our sagemaker session bucket:

In [3]:
from sagemaker.s3 import S3Uploader

# upload model.tar.gz to s3
s3_model_uri = S3Uploader.upload(local_path="./model.tar.gz", desired_s3_uri=f"s3://{sess.default_bucket()}/llava-v1.5-7b")

print(f"model uploaded to: {s3_model_uri}")

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml
model uploaded to: s3://sagemaker-us-west-2-297308036828/llava-v1.5-7b/model.tar.gz


We will use `HuggingfaceModel` to create our real-time inference endpoint:

In [4]:

from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
 model_data=s3_model_uri, # path to your model and script
 role=role, # iam role with permissions to create an Endpoint
 transformers_version="4.28.1", # transformers version used
 pytorch_version="2.0.0", # pytorch version used
 py_version='py310', # python version used
 model_server_workers=1
)

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
 initial_instance_count=1,
 instance_type="ml.g5.xlarge",
 # container_startup_health_check_timeout=600, # increase timeout for large models
 # model_data_download_timeout=600, # increase timeout for large models
)

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml
-----------------!

The `.deploy()` returns an `HuggingFacePredictor` object which can be used to request inference using the `.predict()` method. Our endpoint expects a `json` with at least `image` and `question` key.

In [5]:
data = {
 "image" : 'https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png', 
 "question" : "Describe the image and color details.",
 # "max_new_tokens" : 1024,
 # "temperature" : 0.2,
 # "conv_mode" : "llava_v1"
}

# request
output = predictor.predict(data)
print(output)

The image features a red and black toy horse with a pair of glasses on its face. The horse is wearing a pair of red glasses, which adds a unique and quirky touch to the toy. The horse's legs are also painted in red and black colors, further enhancing its appearance. The toy horse is standing on a grey surface, which serves as a backdrop for the scene.


The inference ` predictor` can also be initilized like with your deployed `endpoint_name` :

In [14]:
import sagemaker
import boto3
sess = sagemaker.Session()
try:
 role = sagemaker.get_execution_role()
except ValueError:
 iam = boto3.client('iam')
 # setup your own rolename in sagemaker
 role = iam.get_role(RoleName='AmazonSageMaker-ExecutionRole-20231008T201275')['Role']['Arn']

from sagemaker.huggingface.model import HuggingFacePredictor
# initial the endpoint predictor
predictor2 = HuggingFacePredictor(
 endpoint_name="huggingface-pytorch-inference-2023-10-19-05-57-37-847",
 sagemaker_session=sess
)

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml


Couldn't call 'get_role' to get Role ARN from role name arn:aws:iam::297308036828:root to get Role path.


To clean up, we can delete the model and endpoint by `delete_endpoint()`or using sagemaker console:

In [16]:
# delete sagemaker endpoint
predictor.delete_model()
predictor.delete_endpoint()