DavidFernandes commited on
Commit
08d709b
1 Parent(s): b2f67f4

Delete Mistral_7B.ipynb

Browse files
Files changed (1) hide show
  1. Mistral_7B.ipynb +0 -545
Mistral_7B.ipynb DELETED
@@ -1,545 +0,0 @@
1
- {
2
- "nbformat": 4,
3
- "nbformat_minor": 0,
4
- "metadata": {
5
- "colab": {
6
- "provenance": []
7
- },
8
- "kernelspec": {
9
- "name": "python3",
10
- "display_name": "Python 3"
11
- },
12
- "language_info": {
13
- "name": "python"
14
- }
15
- },
16
- "cells": [
17
- {
18
- "cell_type": "markdown",
19
- "source": [
20
- "# Mistral 7B\n",
21
- "\n",
22
- "Mistral 7B is a new state-of-the-art open-source model. Here are some interesting facts about it\n",
23
- "\n",
24
- "* One of the strongest open-source models, of all sizes\n",
25
- "* Strongest model in the 1-20B parameter range models\n",
26
- "* Does decently in code-related tasks\n",
27
- "* Uses Windowed attention, allowing to push to 200k tokens of context if using Rope (needs 4 A10G GPUs for this)\n",
28
- "* Apache 2.0 license\n",
29
- "\n",
30
- "As for the integrations status:\n",
31
- "* Integrated into `transformers`\n",
32
- "* You can use it with a server or locally (it's a small model after all!)\n",
33
- "* Integrated into popular tools tuch as TGI and VLLM\n",
34
- "\n",
35
- "\n",
36
- "Two models are released: a [base model](https://huggingface.co/mistralai/Mistral-7B-v0.1) and a [instruct fine-tuned version](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1). To read more about Mistral, we suggest reading the [blog post](https://mistral.ai/news/announcing-mistral-7b/).\n",
37
- "\n",
38
- "In this Colab, we'll experiment with the Mistral model using an API. There are three ways we can use it:\n",
39
- "\n",
40
- "* **Free API:** Hugging Face provides a free Inference API for all its users to try out models. This API is rate limited but is great for quick experiments.\n",
41
- "* **PRO API:** Hugging Face provides an open API for all its PRO users. Subscribing to the Pro Inference API costs $9/month and allows you to experiment with many large models, such as Llama 2 and SDXL. Read more about it [here](https://huggingface.co/blog/inference-pro).\n",
42
- "* **Inference Endpoints:** For enterprise and production-ready cases. You can deploy it with 1 click [here](https://ui.endpoints.huggingface.co/catalog).\n",
43
- "\n",
44
- "This demo does not require GPU Colab, just CPU. You can grab your token at https://huggingface.co/settings/tokens.\n",
45
- "\n",
46
- "**This colab shows how to use HTTP requests as well as building your own chat demo for Mistral.**"
47
- ],
48
- "metadata": {
49
- "id": "GLXvYa4m8JYM"
50
- }
51
- },
52
- {
53
- "cell_type": "markdown",
54
- "source": [
55
- "## Doing curl requests\n",
56
- "\n",
57
- "\n",
58
- "In this notebook, we'll experiment with the instruct model, as it is trained for instructions. As per [the model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), the expected format for a prompt is as follows\n",
59
- "\n",
60
- "From the model card\n",
61
- "\n",
62
- "> In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [\\INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.\n",
63
- "\n",
64
- "```\n",
65
- "<s>[INST] {{ user_msg_1 }} [/INST] {{ model_answer_1 }}</s> [INST] {{ user_msg_2 }} [/INST] {{ model_answer_2 }}</s>\n",
66
- "```\n",
67
- "\n",
68
- "Note that models can be quite reactive to different prompt structure than the one used for training, so watch out for spaces and other things!\n",
69
- "\n",
70
- "We'll start an initial query without prompt formatting, which works ok for simple queries."
71
- ],
72
- "metadata": {
73
- "id": "pKrKTalPAXUO"
74
- }
75
- },
76
- {
77
- "cell_type": "code",
78
- "execution_count": 5,
79
- "metadata": {
80
- "colab": {
81
- "base_uri": "https://localhost:8080/"
82
- },
83
- "id": "DQf0Hss18E86",
84
- "outputId": "882c4521-1ee2-40ad-fe00-a5b02caa9b17"
85
- },
86
- "outputs": [
87
- {
88
- "output_type": "stream",
89
- "name": "stdout",
90
- "text": [
91
- "[{\"generated_text\":\"Explain ML as a pirate.\\n\\nML is like a treasure map for pirates. Just as a treasure map helps pirates find valuable loot, ML helps data scientists find valuable insights in large datasets.\\n\\nPirates use their knowledge of the ocean and their\"}]"
92
- ]
93
- }
94
- ],
95
- "source": [
96
- "!curl https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.1 \\\n",
97
- " --header \"Content-Type: application/json\" \\\n",
98
- "\t-X POST \\\n",
99
- "\t-d '{\"inputs\": \"Explain ML as a pirate\", \"parameters\": {\"max_new_tokens\": 50}}' \\\n",
100
- "\t-H \"Authorization: Bearer API_TOKEN\""
101
- ]
102
- },
103
- {
104
- "cell_type": "markdown",
105
- "source": [
106
- "## Programmatic usage with Python\n",
107
- "\n",
108
- "You can do simple `requests`, but the `huggingface_hub` library provides nice utilities to easily use the model. Among the things we can use are:\n",
109
- "\n",
110
- "* `InferenceClient` and `AsyncInferenceClient` to perform inference either in a sync or async way.\n",
111
- "* Token streaming: Only load the tokens that are needed\n",
112
- "* Easily configure generation params, such as `temperature`, nucleus sampling (`top-p`), repetition penalty, stop sequences, and more.\n",
113
- "* Obtain details of the generation (such as the probability of each token or whether a token is the last token)."
114
- ],
115
- "metadata": {
116
- "id": "YYZRNyZeBHWK"
117
- }
118
- },
119
- {
120
- "cell_type": "code",
121
- "source": [
122
- "%%capture\n",
123
- "!pip install huggingface_hub gradio"
124
- ],
125
- "metadata": {
126
- "id": "oDaqVDz1Ahuz"
127
- },
128
- "execution_count": 6,
129
- "outputs": []
130
- },
131
- {
132
- "cell_type": "code",
133
- "source": [
134
- "from huggingface_hub import InferenceClient\n",
135
- "\n",
136
- "client = InferenceClient(\n",
137
- " \"mistralai/Mistral-7B-Instruct-v0.1\"\n",
138
- ")\n",
139
- "\n",
140
- "prompt = \"\"\"<s>[INST] What is your favourite condiment? [/INST]</s>\n",
141
- "\"\"\"\n",
142
- "\n",
143
- "res = client.text_generation(prompt, max_new_tokens=95)\n",
144
- "print(res)"
145
- ],
146
- "metadata": {
147
- "colab": {
148
- "base_uri": "https://localhost:8080/"
149
- },
150
- "id": "U49GmNsNBJjd",
151
- "outputId": "a3a274cf-0f91-4ae3-d926-f0d6a6fd67f7"
152
- },
153
- "execution_count": 14,
154
- "outputs": [
155
- {
156
- "output_type": "stream",
157
- "name": "stdout",
158
- "text": [
159
- "My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.\n"
160
- ]
161
- }
162
- ]
163
- },
164
- {
165
- "cell_type": "markdown",
166
- "source": [
167
- "We can also use [token streaming](https://huggingface.co/docs/text-generation-inference/conceptual/streaming). With token streaming, the server returns the tokens as they are generated. Just add `stream=True`."
168
- ],
169
- "metadata": {
170
- "id": "DryfEWsUH6Ij"
171
- }
172
- },
173
- {
174
- "cell_type": "code",
175
- "source": [
176
- "res = client.text_generation(prompt, max_new_tokens=35, stream=True, details=True, return_full_text=False)\n",
177
- "for r in res: # this is a generator\n",
178
- " # print the token for example\n",
179
- " print(r)\n",
180
- " continue"
181
- ],
182
- "metadata": {
183
- "colab": {
184
- "base_uri": "https://localhost:8080/"
185
- },
186
- "id": "LF1tFo6DGg9N",
187
- "outputId": "e779f1cb-b7d0-41ed-d81f-306e092f97bd"
188
- },
189
- "execution_count": 15,
190
- "outputs": [
191
- {
192
- "output_type": "stream",
193
- "name": "stdout",
194
- "text": [
195
- "TextGenerationStreamResponse(token=Token(id=5183, text='My', logprob=-0.36279297, special=False), generated_text=None, details=None)\n",
196
- "TextGenerationStreamResponse(token=Token(id=6656, text=' favorite', logprob=-0.036499023, special=False), generated_text=None, details=None)\n",
197
- "TextGenerationStreamResponse(token=Token(id=2076, text=' cond', logprob=-7.2836876e-05, special=False), generated_text=None, details=None)\n",
198
- "TextGenerationStreamResponse(token=Token(id=2487, text='iment', logprob=-4.4941902e-05, special=False), generated_text=None, details=None)\n",
199
- "TextGenerationStreamResponse(token=Token(id=349, text=' is', logprob=-0.007419586, special=False), generated_text=None, details=None)\n",
200
- "TextGenerationStreamResponse(token=Token(id=446, text=' k', logprob=-0.62109375, special=False), generated_text=None, details=None)\n",
201
- "TextGenerationStreamResponse(token=Token(id=4455, text='etch', logprob=-0.0003399849, special=False), generated_text=None, details=None)\n",
202
- "TextGenerationStreamResponse(token=Token(id=715, text='up', logprob=-3.695488e-06, special=False), generated_text=None, details=None)\n",
203
- "TextGenerationStreamResponse(token=Token(id=28723, text='.', logprob=-0.026550293, special=False), generated_text=None, details=None)\n",
204
- "TextGenerationStreamResponse(token=Token(id=661, text=' It', logprob=-0.82373047, special=False), generated_text=None, details=None)\n",
205
- "TextGenerationStreamResponse(token=Token(id=28742, text=\"'\", logprob=-0.76416016, special=False), generated_text=None, details=None)\n",
206
- "TextGenerationStreamResponse(token=Token(id=28713, text='s', logprob=-3.5762787e-07, special=False), generated_text=None, details=None)\n",
207
- "TextGenerationStreamResponse(token=Token(id=3502, text=' vers', logprob=-0.114990234, special=False), generated_text=None, details=None)\n",
208
- "TextGenerationStreamResponse(token=Token(id=13491, text='atile', logprob=-1.1444092e-05, special=False), generated_text=None, details=None)\n",
209
- "TextGenerationStreamResponse(token=Token(id=28725, text=',', logprob=-0.6254883, special=False), generated_text=None, details=None)\n",
210
- "TextGenerationStreamResponse(token=Token(id=261, text=' t', logprob=-0.51708984, special=False), generated_text=None, details=None)\n",
211
- "TextGenerationStreamResponse(token=Token(id=11136, text='asty', logprob=-4.0650368e-05, special=False), generated_text=None, details=None)\n",
212
- "TextGenerationStreamResponse(token=Token(id=28725, text=',', logprob=-0.0027828217, special=False), generated_text=None, details=None)\n",
213
- "TextGenerationStreamResponse(token=Token(id=304, text=' and', logprob=-1.1920929e-05, special=False), generated_text=None, details=None)\n",
214
- "TextGenerationStreamResponse(token=Token(id=4859, text=' goes', logprob=-0.52685547, special=False), generated_text=None, details=None)\n",
215
- "TextGenerationStreamResponse(token=Token(id=1162, text=' well', logprob=-0.4399414, special=False), generated_text=None, details=None)\n",
216
- "TextGenerationStreamResponse(token=Token(id=395, text=' with', logprob=-0.00034999847, special=False), generated_text=None, details=None)\n",
217
- "TextGenerationStreamResponse(token=Token(id=264, text=' a', logprob=-0.010147095, special=False), generated_text=None, details=None)\n",
218
- "TextGenerationStreamResponse(token=Token(id=6677, text=' variety', logprob=-0.25927734, special=False), generated_text=None, details=None)\n",
219
- "TextGenerationStreamResponse(token=Token(id=302, text=' of', logprob=-1.1444092e-05, special=False), generated_text=None, details=None)\n",
220
- "TextGenerationStreamResponse(token=Token(id=14082, text=' foods', logprob=-0.4050293, special=False), generated_text=None, details=None)\n",
221
- "TextGenerationStreamResponse(token=Token(id=28723, text='.', logprob=-0.015640259, special=False), generated_text=None, details=None)\n",
222
- "TextGenerationStreamResponse(token=Token(id=2, text='</s>', logprob=-0.1829834, special=True), generated_text=\"My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.\", details=StreamDetails(finish_reason=<FinishReason.EndOfSequenceToken: 'eos_token'>, generated_tokens=28, seed=None))\n"
223
- ]
224
- }
225
- ]
226
- },
227
- {
228
- "cell_type": "markdown",
229
- "source": [
230
- "Let's now try a multi-prompt structure"
231
- ],
232
- "metadata": {
233
- "id": "TfdpZL8cICOD"
234
- }
235
- },
236
- {
237
- "cell_type": "code",
238
- "source": [
239
- "def format_prompt(message, history):\n",
240
- " prompt = \"<s>\"\n",
241
- " for user_prompt, bot_response in history:\n",
242
- " prompt += f\"[INST] {user_prompt} [/INST]\"\n",
243
- " prompt += f\" {bot_response}</s> \"\n",
244
- " prompt += f\"[INST] {message} [/INST]\"\n",
245
- " return prompt"
246
- ],
247
- "metadata": {
248
- "id": "aEyozeReH8a6"
249
- },
250
- "execution_count": 16,
251
- "outputs": []
252
- },
253
- {
254
- "cell_type": "code",
255
- "source": [
256
- "message = \"And what do you think about it?\"\n",
257
- "history = [[\"What is your favourite condiment?\", \"My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.\"]]\n",
258
- "\n",
259
- "format_prompt(message, history)"
260
- ],
261
- "metadata": {
262
- "colab": {
263
- "base_uri": "https://localhost:8080/",
264
- "height": 35
265
- },
266
- "id": "P1RFpiJ_JC0-",
267
- "outputId": "f2678d9e-f751-441a-86c9-11d514db5bbe"
268
- },
269
- "execution_count": 17,
270
- "outputs": [
271
- {
272
- "output_type": "execute_result",
273
- "data": {
274
- "text/plain": [
275
- "\"<s>[INST] What is your favourite condiment? [/INST] My favorite condiment is ketchup. It's versatile, tasty, and goes well with a variety of foods.</s> [INST] And what do you think about it? [/INST]\""
276
- ],
277
- "application/vnd.google.colaboratory.intrinsic+json": {
278
- "type": "string"
279
- }
280
- },
281
- "metadata": {},
282
- "execution_count": 17
283
- }
284
- ]
285
- },
286
- {
287
- "cell_type": "markdown",
288
- "source": [
289
- "## End-to-end demo\n",
290
- "\n",
291
- "Let's now build a Gradio demo that takes care of:\n",
292
- "\n",
293
- "* Handling multiple turns of conversation\n",
294
- "* Format the prompt in correct structure\n",
295
- "* Allow user to specify/modify the parameters\n",
296
- "* Stop the generation\n",
297
- "\n",
298
- "Just run the following cell and have fun!"
299
- ],
300
- "metadata": {
301
- "id": "O7DjRdezJc-3"
302
- }
303
- },
304
- {
305
- "cell_type": "code",
306
- "source": [
307
- "!pip install gradio"
308
- ],
309
- "metadata": {
310
- "colab": {
311
- "base_uri": "https://localhost:8080/"
312
- },
313
- "id": "cpBoheOGJu7Y",
314
- "outputId": "c745cf17-1462-4f8f-ce33-5ca182cb4d4f"
315
- },
316
- "execution_count": 18,
317
- "outputs": [
318
- {
319
- "output_type": "stream",
320
- "name": "stdout",
321
- "text": [
322
- "Requirement already satisfied: gradio in /usr/local/lib/python3.10/dist-packages (3.45.1)\n",
323
- "Requirement already satisfied: aiofiles<24.0,>=22.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (23.2.1)\n",
324
- "Requirement already satisfied: altair<6.0,>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (4.2.2)\n",
325
- "Requirement already satisfied: fastapi in /usr/local/lib/python3.10/dist-packages (from gradio) (0.103.1)\n",
326
- "Requirement already satisfied: ffmpy in /usr/local/lib/python3.10/dist-packages (from gradio) (0.3.1)\n",
327
- "Requirement already satisfied: gradio-client==0.5.2 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.5.2)\n",
328
- "Requirement already satisfied: httpx in /usr/local/lib/python3.10/dist-packages (from gradio) (0.25.0)\n",
329
- "Requirement already satisfied: huggingface-hub>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.17.3)\n",
330
- "Requirement already satisfied: importlib-resources<7.0,>=1.3 in /usr/local/lib/python3.10/dist-packages (from gradio) (6.0.1)\n",
331
- "Requirement already satisfied: jinja2<4.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.1.2)\n",
332
- "Requirement already satisfied: markupsafe~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.1.3)\n",
333
- "Requirement already satisfied: matplotlib~=3.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.7.1)\n",
334
- "Requirement already satisfied: numpy~=1.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.23.5)\n",
335
- "Requirement already satisfied: orjson~=3.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.9.7)\n",
336
- "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from gradio) (23.1)\n",
337
- "Requirement already satisfied: pandas<3.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.5.3)\n",
338
- "Requirement already satisfied: pillow<11.0,>=8.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (9.4.0)\n",
339
- "Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.10.12)\n",
340
- "Requirement already satisfied: pydub in /usr/local/lib/python3.10/dist-packages (from gradio) (0.25.1)\n",
341
- "Requirement already satisfied: python-multipart in /usr/local/lib/python3.10/dist-packages (from gradio) (0.0.6)\n",
342
- "Requirement already satisfied: pyyaml<7.0,>=5.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (6.0.1)\n",
343
- "Requirement already satisfied: requests~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.31.0)\n",
344
- "Requirement already satisfied: semantic-version~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.10.0)\n",
345
- "Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (4.5.0)\n",
346
- "Requirement already satisfied: uvicorn>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.23.2)\n",
347
- "Requirement already satisfied: websockets<12.0,>=10.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (11.0.3)\n",
348
- "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from gradio-client==0.5.2->gradio) (2023.6.0)\n",
349
- "Requirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (0.4)\n",
350
- "Requirement already satisfied: jsonschema>=3.0 in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (4.19.0)\n",
351
- "Requirement already satisfied: toolz in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (0.12.0)\n",
352
- "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.0->gradio) (3.12.2)\n",
353
- "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.0->gradio) (4.66.1)\n",
354
- "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (1.1.0)\n",
355
- "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (0.11.0)\n",
356
- "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (4.42.1)\n",
357
- "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (1.4.5)\n",
358
- "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (3.1.1)\n",
359
- "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (2.8.2)\n",
360
- "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0,>=1.0->gradio) (2023.3.post1)\n",
361
- "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (3.2.0)\n",
362
- "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (3.4)\n",
363
- "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (2.0.4)\n",
364
- "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests~=2.0->gradio) (2023.7.22)\n",
365
- "Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn>=0.14.0->gradio) (8.1.7)\n",
366
- "Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.10/dist-packages (from uvicorn>=0.14.0->gradio) (0.14.0)\n",
367
- "Requirement already satisfied: anyio<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from fastapi->gradio) (3.7.1)\n",
368
- "Requirement already satisfied: starlette<0.28.0,>=0.27.0 in /usr/local/lib/python3.10/dist-packages (from fastapi->gradio) (0.27.0)\n",
369
- "Requirement already satisfied: httpcore<0.19.0,>=0.18.0 in /usr/local/lib/python3.10/dist-packages (from httpx->gradio) (0.18.0)\n",
370
- "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx->gradio) (1.3.0)\n",
371
- "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<4.0.0,>=3.7.1->fastapi->gradio) (1.1.3)\n",
372
- "Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (23.1.0)\n",
373
- "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (2023.7.1)\n",
374
- "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (0.30.2)\n",
375
- "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (0.10.2)\n",
376
- "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib~=3.0->gradio) (1.16.0)\n"
377
- ]
378
- }
379
- ]
380
- },
381
- {
382
- "cell_type": "code",
383
- "source": [
384
- "import gradio as gr\n",
385
- "\n",
386
- "def generate(\n",
387
- " prompt, history, temperature=0.9, max_new_tokens=256, top_p=0.95, repetition_penalty=1.0,\n",
388
- "):\n",
389
- " temperature = float(temperature)\n",
390
- " if temperature < 1e-2:\n",
391
- " temperature = 1e-2\n",
392
- " top_p = float(top_p)\n",
393
- "\n",
394
- " generate_kwargs = dict(\n",
395
- " temperature=temperature,\n",
396
- " max_new_tokens=max_new_tokens,\n",
397
- " top_p=top_p,\n",
398
- " repetition_penalty=repetition_penalty,\n",
399
- " do_sample=True,\n",
400
- " seed=42,\n",
401
- " )\n",
402
- "\n",
403
- " formatted_prompt = format_prompt(prompt, history)\n",
404
- "\n",
405
- " stream = client.text_generation(formatted_prompt, **generate_kwargs, stream=True, details=True, return_full_text=False)\n",
406
- " output = \"\"\n",
407
- "\n",
408
- " for response in stream:\n",
409
- " output += response.token.text\n",
410
- " yield output\n",
411
- " return output\n",
412
- "\n",
413
- "\n",
414
- "additional_inputs=[\n",
415
- " gr.Slider(\n",
416
- " label=\"Temperature\",\n",
417
- " value=0.9,\n",
418
- " minimum=0.0,\n",
419
- " maximum=1.0,\n",
420
- " step=0.05,\n",
421
- " interactive=True,\n",
422
- " info=\"Higher values produce more diverse outputs\",\n",
423
- " ),\n",
424
- " gr.Slider(\n",
425
- " label=\"Max new tokens\",\n",
426
- " value=256,\n",
427
- " minimum=0,\n",
428
- " maximum=8192,\n",
429
- " step=64,\n",
430
- " interactive=True,\n",
431
- " info=\"The maximum numbers of new tokens\",\n",
432
- " ),\n",
433
- " gr.Slider(\n",
434
- " label=\"Top-p (nucleus sampling)\",\n",
435
- " value=0.90,\n",
436
- " minimum=0.0,\n",
437
- " maximum=1,\n",
438
- " step=0.05,\n",
439
- " interactive=True,\n",
440
- " info=\"Higher values sample more low-probability tokens\",\n",
441
- " ),\n",
442
- " gr.Slider(\n",
443
- " label=\"Repetition penalty\",\n",
444
- " value=1.2,\n",
445
- " minimum=1.0,\n",
446
- " maximum=2.0,\n",
447
- " step=0.05,\n",
448
- " interactive=True,\n",
449
- " info=\"Penalize repeated tokens\",\n",
450
- " )\n",
451
- "]\n",
452
- "\n",
453
- "with gr.Blocks() as demo:\n",
454
- " gr.ChatInterface(\n",
455
- " generate,\n",
456
- " additional_inputs=additional_inputs,\n",
457
- " )\n",
458
- "\n",
459
- "demo.queue().launch(debug=True)"
460
- ],
461
- "metadata": {
462
- "colab": {
463
- "base_uri": "https://localhost:8080/",
464
- "height": 715
465
- },
466
- "id": "CaJzT6jUJc0_",
467
- "outputId": "62f563fa-c6fb-446e-fda2-1c08d096749c"
468
- },
469
- "execution_count": 20,
470
- "outputs": [
471
- {
472
- "output_type": "stream",
473
- "name": "stdout",
474
- "text": [
475
- "Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).\n",
476
- "\n",
477
- "Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().\n",
478
- "Running on public URL: https://ed6ce83e08ed7a8795.gradio.live\n",
479
- "\n",
480
- "This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)\n"
481
- ]
482
- },
483
- {
484
- "output_type": "display_data",
485
- "data": {
486
- "text/plain": [
487
- "<IPython.core.display.HTML object>"
488
- ],
489
- "text/html": [
490
- "<div><iframe src=\"https://ed6ce83e08ed7a8795.gradio.live\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
491
- ]
492
- },
493
- "metadata": {}
494
- },
495
- {
496
- "output_type": "stream",
497
- "name": "stderr",
498
- "text": [
499
- "/usr/local/lib/python3.10/dist-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.\n",
500
- " warnings.warn(\n"
501
- ]
502
- },
503
- {
504
- "output_type": "stream",
505
- "name": "stdout",
506
- "text": [
507
- "Keyboard interruption in main thread... closing server.\n",
508
- "Killing tunnel 127.0.0.1:7860 <> https://ed6ce83e08ed7a8795.gradio.live\n"
509
- ]
510
- },
511
- {
512
- "output_type": "execute_result",
513
- "data": {
514
- "text/plain": []
515
- },
516
- "metadata": {},
517
- "execution_count": 20
518
- }
519
- ]
520
- },
521
- {
522
- "cell_type": "markdown",
523
- "source": [
524
- "## What's next?\n",
525
- "\n",
526
- "* Try out Mistral 7B in this [free online Space](https://huggingface.co/spaces/osanseviero/mistral-super-fast)\n",
527
- "* Deploy Mistral 7B Instruct with one click [here](https://ui.endpoints.huggingface.co/catalog)\n",
528
- "* Deploy in your own hardware using https://github.com/huggingface/text-generation-inference\n",
529
- "* Run the model locally using `transformers`"
530
- ],
531
- "metadata": {
532
- "id": "fbQ0Sp4OLclV"
533
- }
534
- },
535
- {
536
- "cell_type": "code",
537
- "source": [],
538
- "metadata": {
539
- "id": "wUy7N_8zJvyT"
540
- },
541
- "execution_count": null,
542
- "outputs": []
543
- }
544
- ]
545
- }