parkjunsoo91 JoohoSong commited on
Commit
fe2d935
·
verified ·
1 Parent(s): e56ddce

Upload Llama-3-1-Varco-8B.ipynb (#5)

Browse files

- Upload Llama-3-1-Varco-8B.ipynb (2d5e31ec7bac5dc1b3fc25f77a4cc7acd427f373)


Co-authored-by: Jooho Song <JoohoSong@users.noreply.huggingface.co>

Files changed (1) hide show
  1. Llama-3-1-Varco-8B.ipynb +343 -0
Llama-3-1-Varco-8B.ipynb ADDED
@@ -0,0 +1,343 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# Deploy Llama-VARCO-8B-Instruct Model from AWS Marketplace \n"
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "markdown",
12
+ "metadata": {},
13
+ "source": [
14
+ "\n",
15
+ "\n",
16
+ "Llama-VARCO-8B-Instruct is a generative model built with Llama, specifically designed to excel in Korean through additional training. The model uses continual pre-training with both Korean and English datasets to enhance its understanding and generation capabilites in Korean, while also maintaining its proficiency in English. It performs supervised fine-tuning (SFT) and direct preference optimization (DPO) in Korean to align with human preferences.\n",
17
+ "\n",
18
+ "This sample notebook shows you how to deploy [Llama-VARCO-8B-Instruct](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e) using Amazon SageMaker.\n",
19
+ "\n",
20
+ "> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.\n",
21
+ "\n",
22
+ "## Pre-requisites:\n",
23
+ "1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.\n",
24
+ "1. Ensure that IAM role used has **AmazonSageMakerFullAccess**\n",
25
+ "1. To deploy this ML model successfully, ensure that:\n",
26
+ " 1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: \n",
27
+ " 1. **aws-marketplace:ViewSubscriptions**\n",
28
+ " 1. **aws-marketplace:Unsubscribe**\n",
29
+ " 1. **aws-marketplace:Subscribe** \n",
30
+ "\n",
31
+ "## Contents:\n",
32
+ "1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)\n",
33
+ "2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)\n",
34
+ "3. [Clean-up](#3.-Clean-up)\n",
35
+ "\n",
36
+ " \n",
37
+ "\n",
38
+ "## Usage instructions\n",
39
+ "You can run this notebook one cell at a time (By using Shift+Enter for running a cell)."
40
+ ]
41
+ },
42
+ {
43
+ "cell_type": "markdown",
44
+ "metadata": {},
45
+ "source": [
46
+ "## 1. Subscribe to the model package"
47
+ ]
48
+ },
49
+ {
50
+ "cell_type": "markdown",
51
+ "metadata": {
52
+ "tags": []
53
+ },
54
+ "source": [
55
+ "To subscribe to the model package:\n",
56
+ "1. Open the model package [listing page](https://aws.amazon.com/marketplace/pp/prodview-pynin2e23lb3e)\n",
57
+ "1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.\n",
58
+ "1. On the **Subscribe to this software** page, review and click on **\"Accept Offer\"** if you and your organization agrees with EULA, pricing, and support terms. \n",
59
+ "1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell."
60
+ ]
61
+ },
62
+ {
63
+ "cell_type": "code",
64
+ "execution_count": null,
65
+ "metadata": {
66
+ "tags": []
67
+ },
68
+ "outputs": [],
69
+ "source": [
70
+ "model_package_arn = \"arn:aws:sagemaker:us-west-2:594846645681:model-package/llama-varco-8b-ist-bedrock-37339dbb44f23f488e24f8671eaa0494\""
71
+ ]
72
+ },
73
+ {
74
+ "cell_type": "code",
75
+ "execution_count": null,
76
+ "metadata": {
77
+ "tags": []
78
+ },
79
+ "outputs": [],
80
+ "source": [
81
+ "import base64\n",
82
+ "import json\n",
83
+ "import uuid\n",
84
+ "from sagemaker import ModelPackage\n",
85
+ "import sagemaker as sage\n",
86
+ "from sagemaker import get_execution_role\n",
87
+ "from sagemaker import ModelPackage\n",
88
+ "import boto3\n",
89
+ "from IPython.display import Image\n",
90
+ "from PIL import Image as ImageEdit\n",
91
+ "import numpy as np\n",
92
+ "import io"
93
+ ]
94
+ },
95
+ {
96
+ "cell_type": "code",
97
+ "execution_count": null,
98
+ "metadata": {
99
+ "tags": []
100
+ },
101
+ "outputs": [],
102
+ "source": [
103
+ "role = get_execution_role()\n",
104
+ "\n",
105
+ "sagemaker_session = sage.Session()\n",
106
+ "\n",
107
+ "bucket = sagemaker_session.default_bucket()\n",
108
+ "runtime = boto3.client(\"runtime.sagemaker\")"
109
+ ]
110
+ },
111
+ {
112
+ "cell_type": "markdown",
113
+ "metadata": {},
114
+ "source": [
115
+ "## 2. Create an endpoint and perform real-time inference"
116
+ ]
117
+ },
118
+ {
119
+ "cell_type": "markdown",
120
+ "metadata": {},
121
+ "source": [
122
+ "If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html)."
123
+ ]
124
+ },
125
+ {
126
+ "cell_type": "code",
127
+ "execution_count": null,
128
+ "metadata": {
129
+ "tags": []
130
+ },
131
+ "outputs": [],
132
+ "source": [
133
+ "model_name = \"Llama-VARCO-8B-Instruct\"\n",
134
+ "\n",
135
+ "content_type = \"application/json\"\n",
136
+ "\n",
137
+ "real_time_inference_instance_type = (\n",
138
+ " \"ml.g5.12xlarge\"\n",
139
+ ")\n",
140
+ "batch_transform_inference_instance_type = (\n",
141
+ " \"ml.g4dn.12xlarge\"\n",
142
+ ")"
143
+ ]
144
+ },
145
+ {
146
+ "cell_type": "markdown",
147
+ "metadata": {},
148
+ "source": [
149
+ "### A.Create an endpoint"
150
+ ]
151
+ },
152
+ {
153
+ "cell_type": "code",
154
+ "execution_count": null,
155
+ "metadata": {
156
+ "tags": []
157
+ },
158
+ "outputs": [],
159
+ "source": [
160
+ "# create a deployable model from the model package.\n",
161
+ "model = ModelPackage(\n",
162
+ " role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session\n",
163
+ ")\n",
164
+ "\n",
165
+ "# Deploy the model\n",
166
+ "predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)"
167
+ ]
168
+ },
169
+ {
170
+ "cell_type": "markdown",
171
+ "metadata": {},
172
+ "source": [
173
+ "Once endpoint has been created, you would be able to perform real-time inference."
174
+ ]
175
+ },
176
+ {
177
+ "cell_type": "markdown",
178
+ "metadata": {
179
+ "tags": []
180
+ },
181
+ "source": [
182
+ "### B.Create input payload"
183
+ ]
184
+ },
185
+ {
186
+ "cell_type": "code",
187
+ "execution_count": null,
188
+ "metadata": {
189
+ "tags": []
190
+ },
191
+ "outputs": [],
192
+ "source": [
193
+ "input = {\n",
194
+ " \"messages\": [\n",
195
+ " {\n",
196
+ " \"role\":\"user\",\n",
197
+ " \"content\":\"안녕 넌 누구야?\"\n",
198
+ " }\n",
199
+ " ]\n",
200
+ "}"
201
+ ]
202
+ },
203
+ {
204
+ "cell_type": "markdown",
205
+ "metadata": {},
206
+ "source": [
207
+ "### C. Perform real-time inference"
208
+ ]
209
+ },
210
+ {
211
+ "cell_type": "markdown",
212
+ "metadata": {},
213
+ "source": [
214
+ "##### C-1. Stream Inference Example"
215
+ ]
216
+ },
217
+ {
218
+ "cell_type": "code",
219
+ "execution_count": null,
220
+ "metadata": {
221
+ "tags": []
222
+ },
223
+ "outputs": [],
224
+ "source": [
225
+ "class VarcoInferenceStream():\n",
226
+ " def __init__(self, sagemaker_runtime, endpoint_name):\n",
227
+ " self.sagemaker_runtime = sagemaker_runtime\n",
228
+ " self.endpoint_name = endpoint_name\n",
229
+ "\n",
230
+ " def stream_inference(self, request_body):\n",
231
+ " # Gets a streaming inference response\n",
232
+ " # from the specified model endpoint:\n",
233
+ " response = self.sagemaker_runtime\\\n",
234
+ " .invoke_endpoint_with_response_stream(\n",
235
+ " EndpointName=self.endpoint_name,\n",
236
+ " Body=json.dumps(request_body),\n",
237
+ " ContentType=\"application/json\"\n",
238
+ " )\n",
239
+ " # Gets the EventStream object returned by the SDK:\n",
240
+ " for body in response[\"Body\"]:\n",
241
+ " raw = body['PayloadPart']['Bytes']\n",
242
+ " yield raw.decode()\n",
243
+ "\n",
244
+ "\n",
245
+ "sm_runtime = boto3.client(\"sagemaker-runtime\")\n",
246
+ "varco_inference_stream = VarcoInferenceStream(sm_runtime, model_name)\n",
247
+ "stream = varco_inference_stream.stream_inference(input)\n",
248
+ "for part in stream:\n",
249
+ " print(part, end='')"
250
+ ]
251
+ },
252
+ {
253
+ "cell_type": "markdown",
254
+ "metadata": {
255
+ "tags": []
256
+ },
257
+ "source": [
258
+ "## 3. Clean-up"
259
+ ]
260
+ },
261
+ {
262
+ "cell_type": "markdown",
263
+ "metadata": {},
264
+ "source": [
265
+ "Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged."
266
+ ]
267
+ },
268
+ {
269
+ "cell_type": "markdown",
270
+ "metadata": {},
271
+ "source": [
272
+ "### A. Delete the endpoint"
273
+ ]
274
+ },
275
+ {
276
+ "cell_type": "code",
277
+ "execution_count": null,
278
+ "metadata": {},
279
+ "outputs": [],
280
+ "source": [
281
+ "model.sagemaker_session.delete_endpoint(model_name)\n",
282
+ "model.sagemaker_session.delete_endpoint_config(model_name)"
283
+ ]
284
+ },
285
+ {
286
+ "cell_type": "markdown",
287
+ "metadata": {},
288
+ "source": [
289
+ "### B. Delete the model"
290
+ ]
291
+ },
292
+ {
293
+ "cell_type": "code",
294
+ "execution_count": null,
295
+ "metadata": {},
296
+ "outputs": [],
297
+ "source": [
298
+ "model.delete_model()"
299
+ ]
300
+ },
301
+ {
302
+ "cell_type": "markdown",
303
+ "metadata": {},
304
+ "source": [
305
+ "### C. Unsubscribe to the listing (optional)"
306
+ ]
307
+ },
308
+ {
309
+ "cell_type": "markdown",
310
+ "metadata": {},
311
+ "source": [
312
+ "If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. \n",
313
+ "\n",
314
+ "**Steps to unsubscribe to product from AWS Marketplace**:\n",
315
+ "1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)\n",
316
+ "2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__ to cancel the subscription.\n",
317
+ "\n"
318
+ ]
319
+ }
320
+ ],
321
+ "metadata": {
322
+ "instance_type": "ml.t3.medium",
323
+ "kernelspec": {
324
+ "display_name": "conda_pytorch_p310",
325
+ "language": "python",
326
+ "name": "conda_pytorch_p310"
327
+ },
328
+ "language_info": {
329
+ "codemirror_mode": {
330
+ "name": "ipython",
331
+ "version": 3
332
+ },
333
+ "file_extension": ".py",
334
+ "mimetype": "text/x-python",
335
+ "name": "python",
336
+ "nbconvert_exporter": "python",
337
+ "pygments_lexer": "ipython3",
338
+ "version": "3.10.14"
339
+ }
340
+ },
341
+ "nbformat": 4,
342
+ "nbformat_minor": 4
343
+ }