{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Deploy LLaVA on Amazon SageMaker\n", "\n", "Amazon SageMaker is a popular platform for running AI models, and models on huggingface deploy [Hugging Face Transformers](https://github.com/huggingface/transformers) using [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) and the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/).\n", "\n", "![llava](https://i.imgur.com/YNVG140.png)\n", "\n", "Install sagemaker sdk:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install sagemaker --upgrade\n", "!pip install -r code/requirements.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bundle llava model weights and code into a `model.tar.gz`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Create SageMaker model.tar.gz artifact\n", "!tar -cf model.tar.gz --use-compress-program=pigz *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After we created the `model.tar.gz` archive we can upload it to Amazon S3. We will use the `sagemaker` SDK to upload the model to our sagemaker session bucket.\n", "\n", "Initialize sagemaker session first:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Couldn't call 'get_role' to get Role ARN from role name arn:aws:iam::297308036828:root to get Role path.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml\n", "sagemaker role arn: arn:aws:iam::297308036828:role/service-role/AmazonSageMaker-ExecutionRole-20231008T201275\n", "sagemaker bucket: sagemaker-us-west-2-297308036828\n", "sagemaker session region: us-west-2\n" ] } ], "source": [ "import sagemaker\n", "import boto3\n", "sess = sagemaker.Session()\n", "# sagemaker session bucket -> used for uploading data, models and logs\n", "# sagemaker will automatically create this bucket if it not exists\n", "sagemaker_session_bucket=None\n", "if sagemaker_session_bucket is None and sess is not None:\n", " # set to default bucket if a bucket name is not given\n", " sagemaker_session_bucket = sess.default_bucket()\n", "\n", "try:\n", " role = sagemaker.get_execution_role()\n", "except ValueError:\n", " iam = boto3.client('iam')\n", " # setup your own rolename in sagemaker\n", " role = iam.get_role(RoleName='AmazonSageMaker-ExecutionRole-20231008T201275')['Role']['Arn']\n", "\n", "sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)\n", "\n", "print(f\"sagemaker role arn: {role}\")\n", "print(f\"sagemaker bucket: {sess.default_bucket()}\")\n", "print(f\"sagemaker session region: {sess.boto_region_name}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Upload the `model.tar.gz` to our sagemaker session bucket:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml\n", "model uploaded to: s3://sagemaker-us-west-2-297308036828/llava-v1.5-7b/model.tar.gz\n" ] } ], "source": [ "from sagemaker.s3 import S3Uploader\n", "\n", "# upload model.tar.gz to s3\n", "s3_model_uri = S3Uploader.upload(local_path=\"./model.tar.gz\", desired_s3_uri=f\"s3://{sess.default_bucket()}/llava-v1.5-7b\")\n", "\n", "print(f\"model uploaded to: {s3_model_uri}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use `HuggingfaceModel` to create our real-time inference endpoint:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml\n", "-----------------!" ] } ], "source": [ "\n", "from sagemaker.huggingface.model import HuggingFaceModel\n", "\n", "# create Hugging Face Model Class\n", "huggingface_model = HuggingFaceModel(\n", " model_data=s3_model_uri, # path to your model and script\n", " role=role, # iam role with permissions to create an Endpoint\n", " transformers_version=\"4.28.1\", # transformers version used\n", " pytorch_version=\"2.0.0\", # pytorch version used\n", " py_version='py310', # python version used\n", " model_server_workers=1\n", ")\n", "\n", "# deploy the endpoint endpoint\n", "predictor = huggingface_model.deploy(\n", " initial_instance_count=1,\n", " instance_type=\"ml.g5.xlarge\",\n", " # container_startup_health_check_timeout=600, # increase timeout for large models\n", " # model_data_download_timeout=600, # increase timeout for large models\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `.deploy()` returns an `HuggingFacePredictor` object which can be used to request inference using the `.predict()` method. Our endpoint expects a `json` with at least `image` and `question` key." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The image features a red and black toy horse with a pair of glasses on its face. The horse is wearing a pair of red glasses, which adds a unique and quirky touch to the toy. The horse's legs are also painted in red and black colors, further enhancing its appearance. The toy horse is standing on a grey surface, which serves as a backdrop for the scene.\n" ] } ], "source": [ "data = {\n", " \"image\" : 'https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png', \n", " \"question\" : \"Describe the image and color details.\",\n", " # \"max_new_tokens\" : 1024,\n", " # \"temperature\" : 0.2,\n", " # \"conv_mode\" : \"llava_v1\"\n", "}\n", "\n", "# request\n", "output = predictor.predict(data)\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The inference ` predictor` can also be initilized like with your deployed `endpoint_name` :" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml\n", "sagemaker.config INFO - Not applying SDK defaults from location: /Users/tom/Library/Application Support/sagemaker/config.yaml\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Couldn't call 'get_role' to get Role ARN from role name arn:aws:iam::297308036828:root to get Role path.\n" ] } ], "source": [ "import sagemaker\n", "import boto3\n", "sess = sagemaker.Session()\n", "try:\n", " role = sagemaker.get_execution_role()\n", "except ValueError:\n", " iam = boto3.client('iam')\n", " # setup your own rolename in sagemaker\n", " role = iam.get_role(RoleName='AmazonSageMaker-ExecutionRole-20231008T201275')['Role']['Arn']\n", "\n", "from sagemaker.huggingface.model import HuggingFacePredictor\n", "# initial the endpoint predictor\n", "predictor2 = HuggingFacePredictor(\n", " endpoint_name=\"huggingface-pytorch-inference-2023-10-19-05-57-37-847\",\n", " sagemaker_session=sess\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To clean up, we can delete the model and endpoint by `delete_endpoint()`or using sagemaker console:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# delete sagemaker endpoint\n", "predictor.delete_model()\n", "predictor.delete_endpoint()" ] } ], "metadata": { "kernelspec": { "display_name": "llava", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }