{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "x_Vp8SiKM4p1"
},
"source": [
"# Gradio Interface Draft\n",
"\n",
"The goal of this notebook is to show a draft of a comprehensive Gradio interface though which students can interface with an LLM and self-study."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "o_60X8H3NEne"
},
"source": [
"## Libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "pxcqXgg2aAN7"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"anaconda-project 0.10.1 requires ruamel-yaml, which is not installed.\n",
"conda-repo-cli 1.0.4 requires pathlib, which is not installed.\n",
"daal4py 2021.3.0 requires daal==2021.2.3, which is not installed.\n",
"spyder 5.1.5 requires pyqt5<5.13, which is not installed.\n",
"spyder 5.1.5 requires pyqtwebengine<5.13, which is not installed.\n",
"cookiecutter 1.7.2 requires MarkupSafe<2.0.0, but you have markupsafe 2.1.3 which is incompatible.\n",
"numba 0.54.1 requires numpy<1.21,>=1.17, but you have numpy 1.26.0 which is incompatible.\n",
"pyppeteer 1.0.2 requires websockets<11.0,>=10.0, but you have websockets 11.0.3 which is incompatible.\n",
"scipy 1.7.1 requires numpy<1.23.0,>=1.16.5, but you have numpy 1.26.0 which is incompatible.\n",
"transformers 4.25.1 requires tokenizers!=0.11.3,<0.14,>=0.11.1, but you have tokenizers 0.14.0 which is incompatible.\u001b[0m\u001b[31m\n",
"\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0mCollecting Pillow==9.0.0\n",
" Downloading Pillow-9.0.0-cp39-cp39-macosx_10_10_x86_64.whl (3.0 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m9.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n",
"\u001b[?25h\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0mInstalling collected packages: Pillow\n",
" Attempting uninstall: Pillow\n",
" Found existing installation: Pillow 8.4.0\n",
" Uninstalling Pillow-8.4.0:\n",
" Successfully uninstalled Pillow-8.4.0\n",
"Successfully installed Pillow-9.0.0\n",
"\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0m"
]
}
],
"source": [
"# install libraries here\n",
"# -q flag for \"quiet\" install\n",
"!pip install -q langchain\n",
"!pip install -q openai\n",
"!pip install -q gradio\n",
"!pip install -q unstructured\n",
"!pip install -q chromadb\n",
"!pip install -q tiktoken\n",
"!pip install Pillow==9.0.0\n",
"!pip install -q reportlab"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "pEjM1tLsMZBq"
},
"outputs": [
{
"ename": "ModuleNotFoundError",
"evalue": "No module named 'langchain'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[2], line 6\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mtime\u001b[39;00m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mgetpass\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m getpass\n\u001b[0;32m----> 6\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocstore\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Document\n\u001b[1;32m 7\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mprompts\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m PromptTemplate\n\u001b[1;32m 8\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m TextLoader\n",
"\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'langchain'"
]
}
],
"source": [
"# import libraries here\n",
"import os\n",
"import time\n",
"from getpass import getpass\n",
"\n",
"from langchain.docstore.document import Document\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.document_loaders import TextLoader\n",
"from langchain.indexes import VectorstoreIndexCreator\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"\n",
"from langchain.document_loaders.unstructured import UnstructuredFileLoader\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.chains import RetrievalQA, RetrievalQAWithSourcesChain\n",
"from langchain.chains.qa_with_sources import load_qa_with_sources_chain\n",
"#from langchain.chains import ConversationalRetrievalChain\n",
"\n",
"from langchain.llms import OpenAI\n",
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"import gradio as gr\n",
"from sqlalchemy import TEXT # TODO Why is sqlalchemy imported\n",
"\n",
"import pprint\n",
"\n",
"import json\n",
"from google.colab import files\n",
"from reportlab.pdfgen.canvas import Canvas"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "n0BTyPI_srMg"
},
"source": [
"# Export requirements.txt (if needed)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "NOX639OA2pOh"
},
"outputs": [],
"source": [
"!pip freeze > requirements.txt"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "03KLZGI_a5W5"
},
"source": [
"## API Keys\n",
"\n",
"Use these cells to load the API keys required for this notebook. The below code cell uses the `getpass` library."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5smcWj4DbFgy",
"outputId": "0aec236a-dfc8-4afb-d97c-ff324b85bb70"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"··········\n"
]
}
],
"source": [
"openai_api_key = getpass()\n",
"os.environ[\"OPENAI_API_KEY\"] = openai_api_key"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "fck1RVxD8xSX"
},
"outputs": [],
"source": [
"llm = ChatOpenAI(model_name = 'gpt-3.5-turbo-16k')\n",
"# TODO OpenAI() or ChatOpenAI()?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UzaYNFFT4AwX"
},
"source": [
"# Interface"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "N9i8zsMmbcd3"
},
"outputs": [],
"source": [
"global db # Document-storing object (vector store / index)\n",
"global qa # Question-answer object; retrieves from `db`\n",
"global srcs # List of source documents fragments referenced by vector store\n",
"num_sources = 100 # Maximum number of source documents which can be shown\n",
"\n",
"srcs = []\n",
"# See https://github.com/hwchase17/langchain/discussions/3786 for discussion\n",
"# of which splitter to use\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "fmFg2_iyvRKI"
},
"outputs": [],
"source": [
"### Source Display ###\n",
"\n",
"def format_source_reference(document, index):\n",
" \"\"\"Return a HTML element which contains the `document` info and can be\n",
" referenced by the `index`.\"\"\"\n",
" if 'source' in document.metadata:\n",
" source_filepath, source_name = os.path.split(document.metadata['source'])\n",
" else:\n",
" source_name = \"text box\"\n",
" return f\"
[{index+1}] {source_name}
...{document.page_content}...
[{index+1}] {source_filename}
...{document.page_content}...