mistral-super-fast

Running

App Files Files Community

osanseviero HF staff commited on Sep 28, 2023

Commit

cf1a53b

•

1 Parent(s): a0fcf12

Upload Mistral_7B.ipynb

Browse files

Files changed (1) hide show

Mistral_7B.ipynb +547 -0

Mistral_7B.ipynb ADDED Viewed

	@@ -0,0 +1,547 @@

+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Mistral 7B\n",
+        "\n",
+        "Mistral 7B is a new state-of-the-art open-source model. Here are some interesting facts about it\n",
+        "\n",
+        "* One of the strongest open-source models, of all sizes\n",
+        "* Strongest model in the 1-20B parameter range models\n",
+        "* Does decently in code-related tasks\n",
+        "* Uses Windowed attention, allowing to push to 200k tokens of context if using Rope (needs 4 A10G GPUs for this)\n",
+        "* Apache 2.0 license\n",
+        "\n",
+        "As for the integrations status:\n",
+        "* Integrated into `transformers`\n",
+        "* You can use it with a server or locally (it's a small model after all!)\n",
+        "* Integrated into popular tools tuch as TGI and VLLM\n",
+        "\n",
+        "\n",
+        "Two models are released: a [base model](https://huggingface.co/mistralai/Mistral-7B-v0.1) and a [instruct fine-tuned version](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1). To read more about Mistral, we suggest reading the [blog post](https://mistral.ai/news/announcing-mistral-7b/).\n",
+        "\n",
+        "In this Colab, we'll experiment with the Mistral model using an API. There are three ways we can use it:\n",
+        "\n",
+        "* **Free API:** Hugging Face provides a free Inference API for all its users to try out models. This API is rate limited but is great for quick experiments.\n",
+        "* **PRO API:** Hugging Face provides an open API for all its PRO users.  Subscribing to the Pro Inference API costs $9/month and allows you to experiment with many large models, such as Llama 2 and SDXL. Read more about it [here](https://huggingface.co/blog/inference-pro).\n",
+        "* **Inference Endpoints:** For enterprise and production-ready cases. You can deploy it with 1 click [here](https://ui.endpoints.huggingface.co/catalog).\n",
+        "\n",
+        "This demo does not require GPU Colab, just CPU. You can grab your token at https://huggingface.co/settings/tokens.\n",
+        "\n",
+        "**This colab shows how to use HTTP requests as well as building your own chat demo for Mistral.**"
+      ],
+      "metadata": {
+        "id": "GLXvYa4m8JYM"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Doing curl requests\n",
+        "\n",
+        "\n",
+        "In this notebook, we'll experiment with the instruct model, as it is trained for instructions. As per [the model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), the expected format for a prompt is as follows\n",
+        "\n",
+        "From the model card\n",
+        "\n",
+        "> In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [\\INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.\n",
+        "\n",
+        "```\n",
+        "<s>[INST] {{ user_msg_1 }} [/INST] {{ model_answer_1 }}</s> [INST] {{ user_msg_2 }} [/INST] {{ model_answer_2 }}</s>\n",
+        "```\n",
+        "\n",
+        "Note that models can be quite reactive to different prompt structure than the one used for training, so watch out for spaces and other things!\n",
+        "\n",
+        "We'll start an initial query without prompt formatting, which works ok for simple queries."
+      ],
+      "metadata": {
+        "id": "pKrKTalPAXUO"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "DQf0Hss18E86",
+        "outputId": "882c4521-1ee2-40ad-fe00-a5b02caa9b17"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "[{\"generated_text\":\"Explain ML as a pirate.\\n\\nML is like a treasure map for pirates. Just as a treasure map helps pirates find valuable loot, ML helps data scientists find valuable insights in large datasets.\\n\\nPirates use their knowledge of the ocean and their\"}]"
+          ]
+        }
+      ],
+      "source": [
+        "!curl https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.1 \\\n",
+        "  --header \"Content-Type: application/json\" \\\n",
+        "\t-X POST \\\n",
+        "\t-d '{\"inputs\": \"Explain ML as a pirate\", \"parameters\": {\"max_new_tokens\": 50}}' \\\n",
+        "\t-H \"Authorization: Bearer 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\""
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Programmatic usage with Python\n",
+        "\n",
+        "You can do simple `requests`, but the `huggingface_hub` library provides nice utilities to easily use the model. Among the things we can use are:\n",
+        "\n",
+        "* `InferenceClient` and `AsyncInferenceClient` to perform inference either in a sync or async way.\n",
+        "* Token streaming: Only load the tokens that are needed\n",
+        "* Easily configure generation params, such as `temperature`, nucleus sampling (`top-p`), repetition penalty, stop sequences, and more.\n",
+        "* Obtain details of the generation (such as the probability of each token or whether a token is the last token)."
+      ],
+      "metadata": {
+        "id": "YYZRNyZeBHWK"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "%%capture\n",
+        "!pip install huggingface_hub gradio"
+      ],
+      "metadata": {
+        "id": "oDaqVDz1Ahuz"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from huggingface_hub import InferenceClient\n",
+        "\n",
+        "API_URL = \"https://api-inference.huggingface.co/models/\"\n",
+        "\n",
+        "client = InferenceClient(\n",
+        "    \"mistralai/Mistral-7B-Instruct-v0.1\"\n",
+        ")\n",
+        "\n",
+        "prompt = \"\"\"<s>[INST] What is your favourite condiment?  [/INST]</s>\n",
+        "\"\"\"\n",
+        "\n",
+        "res = client.text_generation(prompt, max_new_tokens=95)\n",
+        "print(res)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "U49GmNsNBJjd",
+        "outputId": "a3a274cf-0f91-4ae3-d926-f0d6a6fd67f7"
+      },
+      "execution_count": 14,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We can also use [token streaming](https://huggingface.co/docs/text-generation-inference/conceptual/streaming). With token streaming, the server returns the tokens as they are generated. Just add `stream=True`."
+      ],
+      "metadata": {
+        "id": "DryfEWsUH6Ij"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "res = client.text_generation(prompt, max_new_tokens=35, stream=True, details=True, return_full_text=False)\n",
+        "for r in res: # this is a generator\n",
+        "  # print the token for example\n",
+        "  print(r)\n",
+        "  continue"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "LF1tFo6DGg9N",
+        "outputId": "e779f1cb-b7d0-41ed-d81f-306e092f97bd"
+      },
+      "execution_count": 15,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "TextGenerationStreamResponse(token=Token(id=5183, text='My', logprob=-0.36279297, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=6656, text=' favorite', logprob=-0.036499023, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=2076, text=' cond', logprob=-7.2836876e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=2487, text='iment', logprob=-4.4941902e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=349, text=' is', logprob=-0.007419586, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=446, text=' k', logprob=-0.62109375, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=4455, text='etch', logprob=-0.0003399849, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=715, text='up', logprob=-3.695488e-06, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28723, text='.', logprob=-0.026550293, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=661, text=' It', logprob=-0.82373047, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28742, text=\"'\", logprob=-0.76416016, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28713, text='s', logprob=-3.5762787e-07, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=3502, text=' vers', logprob=-0.114990234, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=13491, text='atile', logprob=-1.1444092e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28725, text=',', logprob=-0.6254883, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=261, text=' t', logprob=-0.51708984, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=11136, text='asty', logprob=-4.0650368e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28725, text=',', logprob=-0.0027828217, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=304, text=' and', logprob=-1.1920929e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=4859, text=' goes', logprob=-0.52685547, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=1162, text=' well', logprob=-0.4399414, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=395, text=' with', logprob=-0.00034999847, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=264, text=' a', logprob=-0.010147095, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=6677, text=' variety', logprob=-0.25927734, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=302, text=' of', logprob=-1.1444092e-05, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=14082, text=' foods', logprob=-0.4050293, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=28723, text='.', logprob=-0.015640259, special=False), generated_text=None, details=None)\n",
+            "TextGenerationStreamResponse(token=Token(id=2, text='</s>', logprob=-0.1829834, special=True), generated_text=\"My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.\", details=StreamDetails(finish_reason=<FinishReason.EndOfSequenceToken: 'eos_token'>, generated_tokens=28, seed=None))\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Let's now try a multi-prompt structure"
+      ],
+      "metadata": {
+        "id": "TfdpZL8cICOD"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def format_prompt(message, history):\n",
+        "  prompt = \"<s>\"\n",
+        "  for user_prompt, bot_response in history:\n",
+        "    prompt += f\"[INST] {user_prompt} [/INST]\"\n",
+        "    prompt += f\" {bot_response}</s> \"\n",
+        "  prompt += f\"[INST] {message} [/INST]\"\n",
+        "  return prompt"
+      ],
+      "metadata": {
+        "id": "aEyozeReH8a6"
+      },
+      "execution_count": 16,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "message = \"And what do you think about it?\"\n",
+        "history = [[\"What is your favourite condiment?\", \"My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.\"]]\n",
+        "\n",
+        "format_prompt(message, history)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 35
+        },
+        "id": "P1RFpiJ_JC0-",
+        "outputId": "f2678d9e-f751-441a-86c9-11d514db5bbe"
+      },
+      "execution_count": 17,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "\"<s>[INST] What is your favourite condiment? [/INST] My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.</s> [INST] And what do you think about it? [/INST]\""
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "string"
+            }
+          },
+          "metadata": {},
+          "execution_count": 17
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## End-to-end demo\n",
+        "\n",
+        "Let's now build a Gradio demo that takes care of:\n",
+        "\n",
+        "* Handling multiple turns of conversation\n",
+        "* Format the prompt in correct structure\n",
+        "* Allow user to specify/modify the parameters\n",
+        "* Stop the generation\n",
+        "\n",
+        "Just run the following cell and have fun!"
+      ],
+      "metadata": {
+        "id": "O7DjRdezJc-3"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install gradio"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "cpBoheOGJu7Y",
+        "outputId": "c745cf17-1462-4f8f-ce33-5ca182cb4d4f"
+      },
+      "execution_count": 18,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Requirement already satisfied: gradio in /usr/local/lib/python3.10/dist-packages (3.45.1)\n",
+            "Requirement already satisfied: aiofiles<24.0,>=22.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (23.2.1)\n",
+            "Requirement already satisfied: altair<6.0,>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (4.2.2)\n",
+            "Requirement already satisfied: fastapi in /usr/local/lib/python3.10/dist-packages (from gradio) (0.103.1)\n",
+            "Requirement already satisfied: ffmpy in /usr/local/lib/python3.10/dist-packages (from gradio) (0.3.1)\n",
+            "Requirement already satisfied: gradio-client==0.5.2 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.5.2)\n",
+            "Requirement already satisfied: httpx in /usr/local/lib/python3.10/dist-packages (from gradio) (0.25.0)\n",
+            "Requirement already satisfied: huggingface-hub>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.17.3)\n",
+            "Requirement already satisfied: importlib-resources<7.0,>=1.3 in /usr/local/lib/python3.10/dist-packages (from gradio) (6.0.1)\n",
+            "Requirement already satisfied: jinja2<4.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.1.2)\n",
+            "Requirement already satisfied: markupsafe~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.1.3)\n",
+            "Requirement already satisfied: matplotlib~=3.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.7.1)\n",
+            "Requirement already satisfied: numpy~=1.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.23.5)\n",
+            "Requirement already satisfied: orjson~=3.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.9.7)\n",
+            "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from gradio) (23.1)\n",
+            "Requirement already satisfied: pandas<3.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.5.3)\n",
+            "Requirement already satisfied: pillow<11.0,>=8.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (9.4.0)\n",
+            "Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.10.12)\n",
+            "Requirement already satisfied: pydub in /usr/local/lib/python3.10/dist-packages (from gradio) (0.25.1)\n",
+            "Requirement already satisfied: python-multipart in /usr/local/lib/python3.10/dist-packages (from gradio) (0.0.6)\n",
+            "Requirement already satisfied: pyyaml<7.0,>=5.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (6.0.1)\n",
+            "Requirement already satisfied: requests~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.31.0)\n",
+            "Requirement already satisfied: semantic-version~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.10.0)\n",
+            "Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (4.5.0)\n",
+            "Requirement already satisfied: uvicorn>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.23.2)\n",
+            "Requirement already satisfied: websockets<12.0,>=10.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (11.0.3)\n",
+            "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from gradio-client==0.5.2->gradio) (2023.6.0)\n",
+            "Requirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (0.4)\n",
+            "Requirement already satisfied: jsonschema>=3.0 in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (4.19.0)\n",
+            "Requirement already satisfied: toolz in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (0.12.0)\n",
+            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.0->gradio) (3.12.2)\n",
+            "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.0->gradio) (4.66.1)\n",
+            "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (1.1.0)\n",
+            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (0.11.0)\n",
+            "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (4.42.1)\n",
+            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (1.4.5)\n",
+            "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (3.1.1)\n",
+            "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (2.8.2)\n",
+            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0,>=1.0->gradio) (2023.3.post1)\n",
+            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (3.2.0)\n",
+            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (3.4)\n",
+            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (2.0.4)\n",
+            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (2023.7.22)\n",
+            "Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn>=0.14.0->gradio) (8.1.7)\n",
+            "Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.10/dist-packages (from uvicorn>=0.14.0->gradio) (0.14.0)\n",
+            "Requirement already satisfied: anyio<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from fastapi->gradio) (3.7.1)\n",
+            "Requirement already satisfied: starlette<0.28.0,>=0.27.0 in /usr/local/lib/python3.10/dist-packages (from fastapi->gradio) (0.27.0)\n",
+            "Requirement already satisfied: httpcore<0.19.0,>=0.18.0 in /usr/local/lib/python3.10/dist-packages (from httpx->gradio) (0.18.0)\n",
+            "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx->gradio) (1.3.0)\n",
+            "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<4.0.0,>=3.7.1->fastapi->gradio) (1.1.3)\n",
+            "Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (23.1.0)\n",
+            "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (2023.7.1)\n",
+            "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (0.30.2)\n",
+            "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (0.10.2)\n",
+            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib~=3.0->gradio) (1.16.0)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import gradio as gr\n",
+        "\n",
+        "def generate(\n",
+        "    prompt, history, temperature=0.9, max_new_tokens=256, top_p=0.95, repetition_penalty=1.0,\n",
+        "):\n",
+        "    temperature = float(temperature)\n",
+        "    if temperature < 1e-2:\n",
+        "        temperature = 1e-2\n",
+        "    top_p = float(top_p)\n",
+        "\n",
+        "    generate_kwargs = dict(\n",
+        "        temperature=temperature,\n",
+        "        max_new_tokens=max_new_tokens,\n",
+        "        top_p=top_p,\n",
+        "        repetition_penalty=repetition_penalty,\n",
+        "        do_sample=True,\n",
+        "        seed=42,\n",
+        "    )\n",
+        "\n",
+        "    formatted_prompt = format_prompt(prompt, history)\n",
+        "\n",
+        "    stream = client.text_generation(formatted_prompt, **generate_kwargs, stream=True, details=True, return_full_text=False)\n",
+        "    output = \"\"\n",
+        "\n",
+        "    for response in stream:\n",
+        "        output += response.token.text\n",
+        "        yield output\n",
+        "    return output\n",
+        "\n",
+        "\n",
+        "additional_inputs=[\n",
+        "    gr.Slider(\n",
+        "        label=\"Temperature\",\n",
+        "        value=0.9,\n",
+        "        minimum=0.0,\n",
+        "        maximum=1.0,\n",
+        "        step=0.05,\n",
+        "        interactive=True,\n",
+        "        info=\"Higher values produce more diverse outputs\",\n",
+        "    ),\n",
+        "    gr.Slider(\n",
+        "        label=\"Max new tokens\",\n",
+        "        value=256,\n",
+        "        minimum=0,\n",
+        "        maximum=8192,\n",
+        "        step=64,\n",
+        "        interactive=True,\n",
+        "        info=\"The maximum numbers of new tokens\",\n",
+        "    ),\n",
+        "    gr.Slider(\n",
+        "        label=\"Top-p (nucleus sampling)\",\n",
+        "        value=0.90,\n",
+        "        minimum=0.0,\n",
+        "        maximum=1,\n",
+        "        step=0.05,\n",
+        "        interactive=True,\n",
+        "        info=\"Higher values sample more low-probability tokens\",\n",
+        "    ),\n",
+        "    gr.Slider(\n",
+        "        label=\"Repetition penalty\",\n",
+        "        value=1.2,\n",
+        "        minimum=1.0,\n",
+        "        maximum=2.0,\n",
+        "        step=0.05,\n",
+        "        interactive=True,\n",
+        "        info=\"Penalize repeated tokens\",\n",
+        "    )\n",
+        "]\n",
+        "\n",
+        "with gr.Blocks() as demo:\n",
+        "    gr.ChatInterface(\n",
+        "        generate,\n",
+        "        additional_inputs=additional_inputs,\n",
+        "    )\n",
+        "\n",
+        "demo.queue().launch(debug=True)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 715
+        },
+        "id": "CaJzT6jUJc0_",
+        "outputId": "62f563fa-c6fb-446e-fda2-1c08d096749c"
+      },
+      "execution_count": 20,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).\n",
+            "\n",
+            "Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().\n",
+            "Running on public URL: https://ed6ce83e08ed7a8795.gradio.live\n",
+            "\n",
+            "This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)\n"
+          ]
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ],
+            "text/html": [
+              "<div><iframe src=\"https://ed6ce83e08ed7a8795.gradio.live\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
+            ]
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "stream",
+          "name": "stderr",
+          "text": [
+            "/usr/local/lib/python3.10/dist-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.\n",
+            "  warnings.warn(\n"
+          ]
+        },
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Keyboard interruption in main thread... closing server.\n",
+            "Killing tunnel 127.0.0.1:7860 <> https://ed6ce83e08ed7a8795.gradio.live\n"
+          ]
+        },
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": []
+          },
+          "metadata": {},
+          "execution_count": 20
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## What's next?\n",
+        "\n",
+        "* Try out Mistral 7B in this [free online Space](https://huggingface.co/spaces/osanseviero/mistral-super-fast)\n",
+        "* Deploy Mistral 7B Instruct with one click [here](https://ui.endpoints.huggingface.co/catalog)\n",
+        "* Deploy in your own hardware using https://github.com/huggingface/text-generation-inference\n",
+        "* Run the model locally using `transformers`"
+      ],
+      "metadata": {
+        "id": "fbQ0Sp4OLclV"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "wUy7N_8zJvyT"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}