Spaces:

Khaquan
/

Burhan02data-sci-chatbot

Runtime error

File size: 14,969 Bytes

df420ed

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c96b164c",
   "metadata": {},
   "source": [
    "# Overview "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22c9cad3",
   "metadata": {},
   "source": [
    "<span style=\"color:red; font-size:18px; font-weight:bold;\">In this notebook, we will try to create a workflow between Langchain and Mixtral LLM.</br>\n",
    "We want to accomplish the following:</span>\n",
    "1. Establish a pipeline to read in a csv and pass it to our LLM. \n",
    "2. Establish a Pandas Agent in our LLM.\n",
    "3. Connect the CSV pipeline to this agent and enable a end-end EDA pipeline."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bd62bd1",
   "metadata": {},
   "source": [
    "# Setting up the Environment "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "76b3d212",
   "metadata": {},
   "outputs": [],
   "source": [
    "####################################################################################################\n",
    "import os\n",
    "import re\n",
    "import subprocess\n",
    "import sys\n",
    "from PIL import Image\n",
    "\n",
    "\n",
    "from langchain import hub\n",
    "from langchain.agents.agent_types import AgentType\n",
    "from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent\n",
    "from langchain_experimental.utilities import PythonREPL\n",
    "from langchain.agents import Tool\n",
    "from langchain.agents.format_scratchpad.openai_tools import (\n",
    "    format_to_openai_tool_messages,\n",
    ")\n",
    "from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser\n",
    "\n",
    "from langchain.agents import AgentExecutor\n",
    "\n",
    "from langchain.agents import create_structured_chat_agent\n",
    "from langchain.memory import ConversationBufferWindowMemory\n",
    "\n",
    "from langchain.output_parsers import PandasDataFrameOutputParser\n",
    "from langchain.prompts import PromptTemplate\n",
    "\n",
    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
    "\n",
    "from langchain_community.chat_models import ChatAnyscale\n",
    "import pandas as pd \n",
    "import matplotlib.pyplot as plt \n",
    "import numpy as np\n",
    "\n",
    "plt.style.use('ggplot')\n",
    "####################################################################################################"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "9da52e1f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# insert your API key here\n",
    "os.environ[\"ANYSCALE_API_KEY\"] = \"esecret_8btufnh3615vnbpd924s1t3q7p\"\n",
    "memory_key = \"history\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edcd6ca7",
   "metadata": {},
   "source": [
    "# Using out of the box agents\n",
    "<span style=\"font-size:16px;color:red;\"> Over here we simply try to run the Pandas Dataframe agent provided by langchain out of the box. It uses a CSV parser and has just one tool, the __PythonAstREPLTool__ which helps it execute python commands in a shell and display output. </span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64a086f2",
   "metadata": {},
   "source": [
    "# DECLARE YOUR LLM HERE "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "083bf9ab",
   "metadata": {},
   "source": [
    "# Building my own DF chain\n",
    "<span style=\"font-size:16px;color:red\"> The above agent does not work well or mistral out of the box. The mistral LLM is unable to use the tools properly. It perhaps needs much more hand-held prompts.</span>\n",
    "\n",
    "### Post Implementation Observation:\n",
    "<span style=\"font-size:16px;color:red\"> We were able to observe the following:</br>\n",
    "    1. The agent now works very well almost always returning the desired output.</br>\n",
    "    2. The agent is unable to output plots. </span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5caa356",
   "metadata": {},
   "source": [
    "# Building an Agent "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "44c07305",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt = ChatPromptTemplate.from_messages(\n",
    "    [\n",
    "        (\n",
    "            \"system\",\n",
    "            f\"\"\"You are Data Analysis assistant. Your job is to use your tools to answer a user query in the best\\\n",
    "            manner possible. Your job is to respond to user queries by generating a python code file.\\\n",
    "            Make sure to include all necessary imports.\\\n",
    "            In case you make plots, make sure to label the axes and add a good title too. \\\n",
    "            You must save any plots in the 'graphs' folder as png only.\\\n",
    "            Provide no explanation for your code.\\\n",
    "            Read the data from 'df.csv'.\\\n",
    "            ** Enclose all your code between triple backticks ``` **\\\n",
    "            RECTIFY ANY ERRORS FROM THE PREVIOUS RUNS.\n",
    "            \"\"\",\n",
    "        ),\n",
    "        (\"user\", \"Dataframe named df: {df}\\nQuery: {input}\\nTools:{tools}\"),\n",
    "        MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "00f1ebc8",
   "metadata": {},
   "outputs": [],
   "source": [
    "python_repl = PythonREPL()\n",
    "\n",
    "repl_tool = Tool(\n",
    "    name=\"python_repl\",\n",
    "    description=\"\"\"A Python shell. Shell can dislay charts too. Use this to execute python commands.\\\n",
    "    You have access to all libraries in python including but not limited to sklearn, pandas, numpy,\\\n",
    "    matplotlib.pyplot, seaborn etc. Input should be a valid python command. If the user has not explicitly\\\n",
    "    asked you to plot the results, always print the final output using print(...)\"\"\",\n",
    "    func=python_repl.run,\n",
    ")\n",
    "\n",
    "if 'code' not in os.listdir():\n",
    "    os.mkdir('code')\n",
    "    \n",
    "tools = [repl_tool]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "dd3b5930",
   "metadata": {},
   "outputs": [],
   "source": [
    "def delete_png_files(dir_path):\n",
    "    for filename in os.listdir(dir_path):\n",
    "        if filename.endswith(\".png\"):\n",
    "            os.remove(os.path.join(dir_path, filename))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "da4b2061",
   "metadata": {},
   "outputs": [],
   "source": [
    "def run_code(code):\n",
    "    \n",
    "    with open(f'{os.getcwd()}/code/code.py', 'w') as file:\n",
    "        file.write(code)\n",
    "        \n",
    "    try:\n",
    "        print(\"Running code ...\\n\")\n",
    "        result = subprocess.run([sys.executable, 'code/code.py'], capture_output=True, text=True, check=True, timeout=20)\n",
    "        return result.stdout, False\n",
    "    \n",
    "    except subprocess.CalledProcessError as e:\n",
    "        print(\"Error in code!\\n\")\n",
    "        return e.stdout + e.stderr, True\n",
    "    \n",
    "    except subprocess.TimeoutExpired:\n",
    "        return \"Execution timed out.\", True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "b82adfda",
   "metadata": {},
   "outputs": [],
   "source": [
    "def infer_EDA(user_input:str = ''):    \n",
    "    agent = (\n",
    "            {\n",
    "                \"input\": lambda x: x[\"input\"],\n",
    "                \"tools\": lambda x:x['tools'],\n",
    "                \"df\": lambda x:x['df'],\n",
    "                \"agent_scratchpad\": lambda x: format_to_openai_tool_messages(\n",
    "                    x[\"intermediate_steps\"]\n",
    "                )\n",
    "            }\n",
    "            | prompt\n",
    "            | llm\n",
    "            | OpenAIToolsAgentOutputParser()\n",
    "        )\n",
    "\n",
    "    EDA_executor = AgentExecutor(agent=agent, tools=tools, df = 'df.csv', verbose=True)\n",
    "    \n",
    "    # Running Inference\n",
    "    error_flag = True\n",
    "    image_path = f'{os.getcwd()}/graphs'\n",
    "    \n",
    "    delete_png_files(image_path)\n",
    "    res = None\n",
    "    \n",
    "    while error_flag:\n",
    "        result = list(EDA_executor.stream({\"input\": user_input, \n",
    "                                     \"df\":pd.read_csv('df.csv'),\n",
    "                                     \"tools\":tools}))\n",
    "\n",
    "        # need to extract the code\n",
    "        pattern = r\"```python\\n(.*?)\\n```\"\n",
    "        matches = re.findall(pattern, result[0]['output'], re.DOTALL)\n",
    "        code = \"import matplotlib\\nmatplotlib.use('Agg')\\n\"\n",
    "        code += \"\\n\".join(matches)\n",
    "\n",
    "        # execute the code\n",
    "        res, error_flag = run_code(code)\n",
    "    \n",
    "    image = None\n",
    "    \n",
    "    for file in os.listdir(image_path):\n",
    "        if file.endswith(\".png\"):\n",
    "            image = np.array(Image.open(f'{image_path}/{file}'))\n",
    "            break\n",
    "    \n",
    "    final_val = result[-1]['output'] + f'\\n{res}'\n",
    "    \n",
    "    return final_val , image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "1735f7c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "def run_agent(data_path:str = '', provider:str = 'Mistral', agent_type:str = 'EDA', query:str = '', layers:str ='', temp:float = 0.1):\n",
    "    \n",
    "    df = pd.read_csv(data_path)\n",
    "    df.to_csv('./df.csv', index=False)\n",
    "    \n",
    "    if provider.lower() == 'mistral':\n",
    "        \n",
    "        llm = ChatAnyscale(model_name='mistralai/Mixtral-8x7B-Instruct-v0.1', temperature=temp)\n",
    "        \n",
    "        if agent_type == 'Data Explorer':\n",
    "            EDA_response, EDA_image = infer_EDA(user_input=query)\n",
    "            return EDA_response, EDA_image\n",
    "        \n",
    "    return None"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2224b34",
   "metadata": {},
   "source": [
    "# Gradio Demo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "471a2947",
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running on local URL:  http://127.0.0.1:7860\n",
      "\n",
      "Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><iframe src=\"http://127.0.0.1:7860/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m ```python\n",
      "import pandas as pd\n",
      "import numpy as np\n",
      "\n",
      "# Read the data from 'df.csv'\n",
      "df = pd.read_csv('df.csv')\n",
      "\n",
      "# Calculate the number of null values in each column\n",
      "null_counts = df.isnull().sum()\n",
      "\n",
      "# Calculate the total number of null values in the dataframe\n",
      "total_null_count = df.isnull().sum().sum()\n",
      "\n",
      "# Print the distribution of null values in the dataframe\n",
      "print(\"Null Value Distribution:\")\n",
      "print(null_counts)\n",
      "print(f\"Total Null Count: {total_null_count}\")\n",
      "```\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "Running code ...\n",
      "\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m ```python\n",
      "import pandas as pd\n",
      "import matplotlib.pyplot as plt\n",
      "\n",
      "# Read the data from 'df.csv'\n",
      "df = pd.read_csv('df.csv')\n",
      "\n",
      "# Calculate the number of null values in each column\n",
      "null_counts = df.isnull().sum()\n",
      "\n",
      "# Prepare data for plotting\n",
      "data = null_counts.sort_values(ascending=False)\n",
      "\n",
      "# Create a bar plot of null values distribution\n",
      "plt.figure(figsize=(10, 6))\n",
      "plt.bar(data.index, data.values)\n",
      "plt.title('Distribution of Null Values in the Dataframe')\n",
      "plt.xlabel('Column Names')\n",
      "plt.ylabel('Number of Null Values')\n",
      "plt.xticks(rotation=45)\n",
      "plt.savefig('graphs/null_values_distribution.png')\n",
      "\n",
      "# Print the null values distribution dataframe\n",
      "print(data)\n",
      "```\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "Running code ...\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import gradio as gr\n",
    "\n",
    "demo = gr.Interface (\n",
    "    run_agent,\n",
    "    [\n",
    "        gr.UploadButton(label=\"Upload your CSV!\"),\n",
    "        gr.Radio([\"Mistral\",\"GPT\"],label=\"Select Your LLM\"),\n",
    "        gr.Radio([\"Data Explorer\", \"Hypothesis Tester\", \"Basic Inference\", \"Super Inference\"], label=\"Choose Your Agent\"),\n",
    "        gr.Text(label=\"Query\", info=\"Your input to the Agent\"),\n",
    "        gr.Text(label=\"Architecture\", info=\"Specify the layer by layer architecture only for Super Inference Agent\"),\n",
    "        gr.Slider(label=\"Model Temperature\", info=\"Slide right to make your model more creative!\")\n",
    "    ],\n",
    "    [\n",
    "        gr.Text(label=\"Updated JSON Object\", info=\"This the JSON Object with the changed component.\"),\n",
    "        gr.Image(label=\"Graph\")\n",
    "    ]\n",
    ")\n",
    "demo.launch(share=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d351a2c2",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}