{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "view-in-github" }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "id": "x_Vp8SiKM4p1" }, "source": [ "# Gradio Interface Draft\n", "\n", "The goal of this notebook is to show a draft of a comprehensive Gradio interface though which students can interface with an LLM and self-study." ] }, { "cell_type": "markdown", "metadata": { "id": "o_60X8H3NEne" }, "source": [ "## Libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "pxcqXgg2aAN7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", "anaconda-project 0.10.1 requires ruamel-yaml, which is not installed.\n", "conda-repo-cli 1.0.4 requires pathlib, which is not installed.\n", "daal4py 2021.3.0 requires daal==2021.2.3, which is not installed.\n", "spyder 5.1.5 requires pyqt5<5.13, which is not installed.\n", "spyder 5.1.5 requires pyqtwebengine<5.13, which is not installed.\n", "cookiecutter 1.7.2 requires MarkupSafe<2.0.0, but you have markupsafe 2.1.3 which is incompatible.\n", "numba 0.54.1 requires numpy<1.21,>=1.17, but you have numpy 1.26.0 which is incompatible.\n", "pyppeteer 1.0.2 requires websockets<11.0,>=10.0, but you have websockets 11.0.3 which is incompatible.\n", "scipy 1.7.1 requires numpy<1.23.0,>=1.16.5, but you have numpy 1.26.0 which is incompatible.\n", "transformers 4.25.1 requires tokenizers!=0.11.3,<0.14,>=0.11.1, but you have tokenizers 0.14.0 which is incompatible.\u001b[0m\u001b[31m\n", "\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0mCollecting Pillow==9.0.0\n", " Downloading Pillow-9.0.0-cp39-cp39-macosx_10_10_x86_64.whl (3.0 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m9.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n", "\u001b[?25h\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0mInstalling collected packages: Pillow\n", " Attempting uninstall: Pillow\n", " Found existing installation: Pillow 8.4.0\n", " Uninstalling Pillow-8.4.0:\n", " Successfully uninstalled Pillow-8.4.0\n", "Successfully installed Pillow-9.0.0\n", "\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0m" ] } ], "source": [ "# install libraries here\n", "# -q flag for \"quiet\" install\n", "!pip install -q langchain\n", "!pip install -q openai\n", "!pip install -q gradio\n", "!pip install -q unstructured\n", "!pip install -q chromadb\n", "!pip install -q tiktoken\n", "!pip install Pillow==9.0.0\n", "!pip install -q reportlab" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "pEjM1tLsMZBq" }, "outputs": [ { "ename": "ModuleNotFoundError", "evalue": "No module named 'langchain'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[2], line 6\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mtime\u001b[39;00m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mgetpass\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m getpass\n\u001b[0;32m----> 6\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocstore\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Document\n\u001b[1;32m 7\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mprompts\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m PromptTemplate\n\u001b[1;32m 8\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m TextLoader\n", "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'langchain'" ] } ], "source": [ "# import libraries here\n", "import os\n", "import time\n", "from getpass import getpass\n", "\n", "from langchain.docstore.document import Document\n", "from langchain.prompts import PromptTemplate\n", "from langchain.document_loaders import TextLoader\n", "from langchain.indexes import VectorstoreIndexCreator\n", "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", "from langchain.embeddings import OpenAIEmbeddings\n", "\n", "from langchain.document_loaders.unstructured import UnstructuredFileLoader\n", "from langchain.vectorstores import Chroma\n", "from langchain.chains import RetrievalQA, RetrievalQAWithSourcesChain\n", "from langchain.chains.qa_with_sources import load_qa_with_sources_chain\n", "#from langchain.chains import ConversationalRetrievalChain\n", "\n", "from langchain.llms import OpenAI\n", "from langchain.chat_models import ChatOpenAI\n", "\n", "import gradio as gr\n", "from sqlalchemy import TEXT # TODO Why is sqlalchemy imported\n", "\n", "import pprint\n", "\n", "import json\n", "from google.colab import files\n", "from reportlab.pdfgen.canvas import Canvas" ] }, { "cell_type": "markdown", "metadata": { "id": "n0BTyPI_srMg" }, "source": [ "# Export requirements.txt (if needed)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "NOX639OA2pOh" }, "outputs": [], "source": [ "!pip freeze > requirements.txt" ] }, { "cell_type": "markdown", "metadata": { "id": "03KLZGI_a5W5" }, "source": [ "## API Keys\n", "\n", "Use these cells to load the API keys required for this notebook. The below code cell uses the `getpass` library." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "5smcWj4DbFgy", "outputId": "0aec236a-dfc8-4afb-d97c-ff324b85bb70" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "··········\n" ] } ], "source": [ "openai_api_key = getpass()\n", "os.environ[\"OPENAI_API_KEY\"] = openai_api_key" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "fck1RVxD8xSX" }, "outputs": [], "source": [ "llm = ChatOpenAI(model_name = 'gpt-3.5-turbo-16k')\n", "# TODO OpenAI() or ChatOpenAI()?" ] }, { "cell_type": "markdown", "metadata": { "id": "UzaYNFFT4AwX" }, "source": [ "# Interface" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "N9i8zsMmbcd3" }, "outputs": [], "source": [ "global db # Document-storing object (vector store / index)\n", "global qa # Question-answer object; retrieves from `db`\n", "global srcs # List of source documents fragments referenced by vector store\n", "num_sources = 100 # Maximum number of source documents which can be shown\n", "\n", "srcs = []\n", "# See https://github.com/hwchase17/langchain/discussions/3786 for discussion\n", "# of which splitter to use\n", "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "fmFg2_iyvRKI" }, "outputs": [], "source": [ "### Source Display ###\n", "\n", "def format_source_reference(document, index):\n", " \"\"\"Return a HTML element which contains the `document` info and can be\n", " referenced by the `index`.\"\"\"\n", " if 'source' in document.metadata:\n", " source_filepath, source_name = os.path.split(document.metadata['source'])\n", " else:\n", " source_name = \"text box\"\n", " return f\"
[{index+1}] {source_name}
...{document.page_content}...
[{index+1}] {source_filename}
...{document.page_content}...