# Deploy Llama-VARCO-8B-Instruct Model from AWS Marketplace 




Llama-VARCO-8B-Instruct is a generative model built with Llama, specifically designed to excel in Korean through additional training. The model uses continual pre-training with both Korean and English datasets to enhance its understanding and generation capabilites in Korean, while also maintaining its proficiency in English. It performs supervised fine-tuning (SFT) and direct preference optimization (DPO) in Korean to align with human preferences.

This sample notebook shows you how to deploy [Llama-VARCO-8B-Instruct](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e) using Amazon SageMaker.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

## Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
 1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
 1. **aws-marketplace:ViewSubscriptions**
 1. **aws-marketplace:Unsubscribe**
 1. **aws-marketplace:Subscribe** 

## Contents:
1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)
2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
3. [Clean-up](#3.-Clean-up)

 

## Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package [listing page](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e)
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

In [None]:
model_package_arn = "arn:aws:sagemaker:us-west-2:594846645681:model-package/llama-varco-8b-ist-bedrock-37339dbb44f23f488e24f8671eaa0494"

In [None]:
import base64
import json
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit
import numpy as np
import io

In [None]:
role = get_execution_role()

sagemaker_session = sage.Session()

bucket = sagemaker_session.default_bucket()
runtime = boto3.client("runtime.sagemaker")

## 2. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

In [None]:
model_name = "Llama-VARCO-8B-Instruct"

content_type = "application/json"

real_time_inference_instance_type = (
 "ml.g5.12xlarge"
)
batch_transform_inference_instance_type = (
 "ml.g4dn.12xlarge"
)

### A.Create an endpoint

In [None]:
# create a deployable model from the model package.
model = ModelPackage(
 role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

# Deploy the model
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)

Once endpoint has been created, you would be able to perform real-time inference.

### B.Create input payload

In [None]:
input = {
 "messages": [
 {
 "role":"user",
 "content":"안녕 넌 누구야?"
 }
 ]
}

### C. Perform real-time inference

##### C-1. Stream Inference Example

In [None]:
class VarcoInferenceStream():
 def __init__(self, sagemaker_runtime, endpoint_name):
 self.sagemaker_runtime = sagemaker_runtime
 self.endpoint_name = endpoint_name

 def stream_inference(self, request_body):
 # Gets a streaming inference response
 # from the specified model endpoint:
 response = self.sagemaker_runtime\
 .invoke_endpoint_with_response_stream(
 EndpointName=self.endpoint_name,
 Body=json.dumps(request_body),
 ContentType="application/json"
 )
 # Gets the EventStream object returned by the SDK:
 for body in response["Body"]:
 raw = body['PayloadPart']['Bytes']
 yield raw.decode()


sm_runtime = boto3.client("sagemaker-runtime")
varco_inference_stream = VarcoInferenceStream(sm_runtime, model_name)
stream = varco_inference_stream.stream_inference(input)
for part in stream:
 print(part, end='')

## 3. Clean-up

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

### A. Delete the endpoint

In [None]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

### B. Delete the model

In [None]:
model.delete_model()

### C. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__ to cancel the subscription.

