{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Deploy Llama-VARCO-8B-Instruct Model from AWS Marketplace \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Llama-VARCO-8B-Instruct is a generative model built with Llama, specifically designed to excel in Korean through additional training. The model uses continual pre-training with both Korean and English datasets to enhance its understanding and generation capabilites in Korean, while also maintaining its proficiency in English. It performs supervised fine-tuning (SFT) and direct preference optimization (DPO) in Korean to align with human preferences.\n", "\n", "This sample notebook shows you how to deploy [Llama-VARCO-8B-Instruct](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e) using Amazon SageMaker.\n", "\n", "> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.\n", "\n", "## Pre-requisites:\n", "1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.\n", "1. Ensure that IAM role used has **AmazonSageMakerFullAccess**\n", "1. To deploy this ML model successfully, ensure that:\n", " 1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: \n", " 1. **aws-marketplace:ViewSubscriptions**\n", " 1. **aws-marketplace:Unsubscribe**\n", " 1. **aws-marketplace:Subscribe** \n", "\n", "## Contents:\n", "1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)\n", "2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)\n", "3. [Clean-up](#3.-Clean-up)\n", "\n", " \n", "\n", "## Usage instructions\n", "You can run this notebook one cell at a time (By using Shift+Enter for running a cell)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Subscribe to the model package" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "To subscribe to the model package:\n", "1. Open the model package [listing page](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e)\n", "1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.\n", "1. On the **Subscribe to this software** page, review and click on **\"Accept Offer\"** if you and your organization agrees with EULA, pricing, and support terms. \n", "1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "model_package_arn = \"arn:aws:sagemaker:us-west-2:594846645681:model-package/llama-varco-8b-ist-bedrock-37339dbb44f23f488e24f8671eaa0494\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import base64\n", "import json\n", "import uuid\n", "from sagemaker import ModelPackage\n", "import sagemaker as sage\n", "from sagemaker import get_execution_role\n", "from sagemaker import ModelPackage\n", "import boto3\n", "from IPython.display import Image\n", "from PIL import Image as ImageEdit\n", "import numpy as np\n", "import io" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "role = get_execution_role()\n", "\n", "sagemaker_session = sage.Session()\n", "\n", "bucket = sagemaker_session.default_bucket()\n", "runtime = boto3.client(\"runtime.sagemaker\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Create an endpoint and perform real-time inference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "model_name = \"Llama-VARCO-8B-Instruct\"\n", "\n", "content_type = \"application/json\"\n", "\n", "real_time_inference_instance_type = (\n", " \"ml.g5.12xlarge\"\n", ")\n", "batch_transform_inference_instance_type = (\n", " \"ml.g4dn.12xlarge\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A.Create an endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# create a deployable model from the model package.\n", "model = ModelPackage(\n", " role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session\n", ")\n", "\n", "# Deploy the model\n", "predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once endpoint has been created, you would be able to perform real-time inference." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### B.Create input payload" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "input = {\n", " \"messages\": [\n", " {\n", " \"role\":\"user\",\n", " \"content\":\"안녕 넌 누구야?\"\n", " }\n", " ]\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### C. Perform real-time inference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### C-1. Stream Inference Example" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "class VarcoInferenceStream():\n", " def __init__(self, sagemaker_runtime, endpoint_name):\n", " self.sagemaker_runtime = sagemaker_runtime\n", " self.endpoint_name = endpoint_name\n", "\n", " def stream_inference(self, request_body):\n", " # Gets a streaming inference response\n", " # from the specified model endpoint:\n", " response = self.sagemaker_runtime\\\n", " .invoke_endpoint_with_response_stream(\n", " EndpointName=self.endpoint_name,\n", " Body=json.dumps(request_body),\n", " ContentType=\"application/json\"\n", " )\n", " # Gets the EventStream object returned by the SDK:\n", " for body in response[\"Body\"]:\n", " raw = body['PayloadPart']['Bytes']\n", " yield raw.decode()\n", "\n", "\n", "sm_runtime = boto3.client(\"sagemaker-runtime\")\n", "varco_inference_stream = VarcoInferenceStream(sm_runtime, model_name)\n", "stream = varco_inference_stream.stream_inference(input)\n", "for part in stream:\n", " print(part, end='')" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## 3. Clean-up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A. Delete the endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.sagemaker_session.delete_endpoint(model_name)\n", "model.sagemaker_session.delete_endpoint_config(model_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### B. Delete the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.delete_model()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### C. Unsubscribe to the listing (optional)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. \n", "\n", "**Steps to unsubscribe to product from AWS Marketplace**:\n", "1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)\n", "2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__ to cancel the subscription.\n", "\n" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "conda_pytorch_p310", "language": "python", "name": "conda_pytorch_p310" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 4 }