{ "cells": [ { "cell_type": "markdown", "id": "db41d8ba-71c0-4951-9a88-e1ae01a282ec", "metadata": {}, "source": [ "# Introduction\n", "Please check out my [blog post](https://datavistics.github.io/posts/jais-inference-endpoints/) for more details!" ] }, { "cell_type": "markdown", "id": "d2534669-003d-490c-9d7a-32607fa5f404", "metadata": {}, "source": [ "# Setup" ] }, { "cell_type": "markdown", "id": "3c830114-dd88-45a9-81b9-78b0e3da7384", "metadata": {}, "source": [ "## Requirements" ] }, { "cell_type": "code", "execution_count": 1, "id": "35386f72-32cb-49fa-a108-3aa504e20429", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.2.1\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.3.2\u001B[0m\n", "\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install -q \"huggingface-hub>=0.20\" ipywidgets" ] }, { "cell_type": "markdown", "id": "b6f72042-173d-4a72-ade1-9304b43b528d", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "id": "99f60998-0490-46c6-a8e6-04845ddda7be", "metadata": { "tags": [] }, "outputs": [], "source": [ "from huggingface_hub import login, whoami, create_inference_endpoint\n", "from getpass import getpass" ] }, { "cell_type": "markdown", "id": "5eece903-64ce-435d-a2fd-096c0ff650bf", "metadata": {}, "source": [ "## Config\n", "Choose your `ENDPOINT_NAME` if you like." ] }, { "cell_type": "code", "execution_count": 3, "id": "dcd7daed-6aca-4fe7-85ce-534bdcd8bc87", "metadata": { "tags": [] }, "outputs": [], "source": [ "ENDPOINT_NAME = \"jais13b-demo\"" ] }, { "cell_type": "code", "execution_count": null, "id": "0ca1140c-3fcc-4b99-9210-6da1505a27b7", "metadata": { "tags": [] }, "outputs": [], "source": [ "login()" ] }, { "cell_type": "markdown", "id": "5f4ba0a8-0a6c-4705-a73b-7be09b889610", "metadata": {}, "source": [ "Some users might have payment registered in an organization. This allows you to connect to an organization (that you are a member of) with a payment method.\n", "\n", "Leave it blank if you want to use your username." ] }, { "cell_type": "code", "execution_count": 5, "id": "88cdbd73-5923-4ae9-9940-b6be935f70fa", "metadata": { "tags": [] }, "outputs": [ { "name": "stdin", "output_type": "stream", "text": [ "What is your Hugging Face 馃 username or organization? (with an added payment method) 路路路路路路路路\n" ] } ], "source": [ "who = whoami()\n", "organization = getpass(prompt=\"What is your Hugging Face 馃 username or organization? (with an added payment method)\")\n", "\n", "namespace = organization or who['name']" ] }, { "cell_type": "markdown", "id": "93096cbc-81c6-4137-a283-6afb0f48fbb9", "metadata": {}, "source": [ "# Inference Endpoints\n", "## Create Inference Endpoint\n", "We are going to use the [API](https://huggingface.co/docs/inference-endpoints/api_reference) to create an [Inference Endpoint](https://huggingface.co/inference-endpoints). This should provide a few main benefits:\n", "- It's convenient (No clicking)\n", "- It's repeatable (We have the code to run it easily)\n", "- It's cheaper (No time spent waiting for it to load, and automatically shut it down)" ] }, { "cell_type": "markdown", "id": "1cf8334d-6500-412e-9d6d-58990c42c110", "metadata": {}, "source": [ "Here is a convenient table of instance details you can use when selecting a GPU. Once you have chosen a GPU in Inference Endpoints, you can use the corresponding `instanceType` and `instanceSize`.\n", "\n", "| hw_desc | instanceType | instanceSize | vRAM |\n", "|---------------------|----------------|--------------|-------|\n", "| 1x Nvidia Tesla T4 | g4dn.xlarge | small | 16GB |\n", "| 4x Nvidia Tesla T4 | g4dn.12xlarge | large | 64GB |\n", "| 1x Nvidia A10G | g5.2xlarge | medium | 24GB |\n", "| 4x Nvidia A10G | g5.12xlarge | xxlarge | 96GB |\n", "| 1x Nvidia A100 | p4de | xlarge | 80GB |\n", "| 2x Nvidia A100 | p4de | 2xlarge | 160GB |\n", "\n", "Note: To use a node (multiple GPUs) you will need to use a sharded version of jais. I'm not sure if there is currently a version like this on the hub. " ] }, { "cell_type": "code", "execution_count": 6, "id": "89c7cc21-3dfe-40e6-80ff-1dcc8558859e", "metadata": { "tags": [] }, "outputs": [], "source": [ "hw_dict = dict(\n", " accelerator=\"gpu\",\n", " vendor=\"aws\",\n", " region=\"us-east-1\",\n", " type=\"protected\",\n", " instance_type=\"p4de\",\n", " instance_size=\"xlarge\",\n", ")" ] }, { "cell_type": "code", "execution_count": 7, "id": "f4267bce-8516-4f3a-b1cc-8ccd6c14a9c7", "metadata": { "tags": [] }, "outputs": [], "source": [ "tgi_env = {\n", " \"MAX_BATCH_PREFILL_TOKENS\": \"2048\",\n", " \"MAX_INPUT_LENGTH\": \"2000\",\n", " 'TRUST_REMOTE_CODE':'true',\n", " \"QUANTIZE\": 'bitsandbytes', \n", " \"MODEL_ID\": \"/repository\"\n", "}" ] }, { "cell_type": "markdown", "id": "74fd83a0-fef0-4e47-8ff1-f4ba7aed131d", "metadata": {}, "source": [ "A couple notes on my choices here:\n", "- I used `derek-thomas/jais-13b-chat-hf` because that repo has SafeTensors merged which will lead to faster loading of the TGI container\n", "- I'm using the latest TGI container as of the time of writing (1.3.4)\n", "- `min_replica=0` allows [zero scaling](https://huggingface.co/docs/inference-endpoints/autoscaling#scaling-to-0) which is really useful for your wallet though think through if this makes sense for your use-case as there will be loading times\n", "- `max_replica` allows you to handle high throughput. Make sure you read through the [docs](https://huggingface.co/docs/inference-endpoints/autoscaling#scaling-criteria) to understand how this scales" ] }, { "cell_type": "code", "execution_count": 8, "id": "9e59de46-26b7-4bb9-bbad-8bba9931bde7", "metadata": { "tags": [] }, "outputs": [], "source": [ "endpoint = create_inference_endpoint(\n", " ENDPOINT_NAME,\n", " repository=\"derek-thomas/jais-13b-chat-hf\", \n", " framework=\"pytorch\",\n", " task=\"text-generation\",\n", " **hw_dict,\n", " min_replica=0,\n", " max_replica=1,\n", " namespace=namespace,\n", " custom_image={\n", " \"health_route\": \"/health\",\n", " \"env\": tgi_env,\n", " \"url\": \"ghcr.io/huggingface/text-generation-inference:1.3.4\",\n", " },\n", ")" ] }, { "cell_type": "markdown", "id": "96d173b2-8980-4554-9039-c62843d3fc7d", "metadata": {}, "source": [ "## Wait until its running" ] }, { "cell_type": "code", "execution_count": null, "id": "5f3a8bd2-753c-49a8-9452-899578beddc5", "metadata": { "tags": [] }, "outputs": [], "source": [ "%%time\n", "endpoint.wait()" ] }, { "cell_type": "code", "execution_count": 10, "id": "189b26f0-d404-4570-a1b9-e2a9d486c1f7", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'POSITIVE'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "endpoint.client.text_generation(\"\"\"\n", "### Instruction: What is the sentiment of the input?\n", "### Examples\n", "I wish the screen was bigger - Negative\n", "I hate the battery - Negative\n", "I love the default appliations - Positive\n", "### Input\n", "I am happy with this purchase - \n", "### Response\n", "\"\"\",\n", " do_sample=True,\n", " repetition_penalty=1.2,\n", " top_p=0.9,\n", " temperature=0.3)" ] }, { "cell_type": "markdown", "id": "bab97c7b-7bac-4bf5-9752-b528294dadc7", "metadata": {}, "source": [ "## Pause Inference Endpoint\n", "Now that we have finished, lets pause the endpoint so we don't incur any extra charges, this will also allow us to analyze the cost." ] }, { "cell_type": "code", "execution_count": 11, "id": "540a0978-7670-4ce3-95c1-3823cc113b85", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Endpoint Status: paused\n" ] } ], "source": [ "endpoint = endpoint.pause()\n", "\n", "print(f\"Endpoint Status: {endpoint.status}\")" ] }, { "cell_type": "markdown", "id": "41abea64-379d-49de-8d9a-355c2f4ce1ac", "metadata": {}, "source": [ "## Analyze Usage\n", "1. Go to your `dashboard_url` printed below\n", "1. Check the dashboard\n", "1. Analyze the Usage & Cost tab" ] }, { "cell_type": "code", "execution_count": null, "id": "16815445-3079-43da-b14e-b54176a07a62", "metadata": { "tags": [] }, "outputs": [], "source": [ "dashboard_url = f'https://ui.endpoints.huggingface.co/{namespace}/endpoints/{ENDPOINT_NAME}/analytics'\n", "print(dashboard_url)" ] }, { "cell_type": "markdown", "id": "b953d5be-2494-4ff8-be42-9daf00c99c41", "metadata": {}, "source": [ "## Delete Endpoint" ] }, { "cell_type": "code", "execution_count": 13, "id": "c310c0f3-6f12-4d5c-838b-3a4c1f2e54ad", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Endpoint deleted successfully\n" ] } ], "source": [ "endpoint = endpoint.delete()\n", "\n", "if not endpoint:\n", " print('Endpoint deleted successfully')\n", "else:\n", " print('Delete Endpoint in manually') " ] }, { "cell_type": "code", "execution_count": null, "id": "611e1345-8d8c-46b1-a9f8-cff27eecb426", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }