{ "cells": [ { "cell_type": "markdown", "id": "c96b164c", "metadata": {}, "source": [ "# Overview " ] }, { "cell_type": "markdown", "id": "22c9cad3", "metadata": {}, "source": [ "In this notebook, we will try to create a workflow between Langchain and Mixtral LLM.\n", "We want to accomplish the following:\n", "1. Establish a pipeline to read in a csv and pass it to our LLM. \n", "2. Establish a Basic Inference Agent.\n", "3. Test the Basic Inference on a few tasks." ] }, { "cell_type": "markdown", "id": "3bd62bd1", "metadata": {}, "source": [ "# Setting up the Environment " ] }, { "cell_type": "code", "execution_count": 2, "id": "76b3d212", "metadata": {}, "outputs": [], "source": [ "####################################################################################################\n", "import os\n", "import re\n", "import pandas as pd \n", "import matplotlib.pyplot as plt \n", "import numpy as np\n", "import openai\n", "\n", "plt.style.use('ggplot')\n", "####################################################################################################" ] }, { "cell_type": "code", "execution_count": 3, "id": "9da52e1f", "metadata": {}, "outputs": [], "source": [ "# insert your API key here\n", "client = openai.OpenAI(\n", " base_url = \"https://api.endpoints.anyscale.com/v1\",\n", " api_key = \"esecret_8btufnh3615vnbpd924s1t3q7p\"\n", ")" ] }, { "cell_type": "markdown", "id": "edcd6ca7", "metadata": {}, "source": [ "# Reading the Dataframe and saving it \n", "We are saving the dataframe in a csv file in the cwd because we will iteratively update this df if need be between each inference. This way the model will continuously have the most upto date version of the dataframe at its disposal" ] }, { "cell_type": "code", "execution_count": 4, "id": "45f55df2", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('../data/train.csv')\n", "df.to_csv('./df.csv', index=False)" ] }, { "cell_type": "markdown", "id": "64a086f2", "metadata": {}, "source": [ "# DECLARE YOUR LLM HERE " ] }, { "cell_type": "code", "execution_count": 4, "id": "3f686305", "metadata": {}, "outputs": [], "source": [ "llm = ChatAnyscale(model_name='mistralai/Mixtral-8x7B-Instruct-v0.1', temperature=0)\n", "agent = create_pandas_dataframe_agent(llm, df, verbose=True)" ] }, { "cell_type": "markdown", "id": "4b1d8f8f", "metadata": {}, "source": [ "# REPL Tool \n", "This tool enables our LLM to 'execute' python code so that we can actually display the results to the end-user." ] }, { "cell_type": "code", "execution_count": 5, "id": "23045c5d", "metadata": {}, "outputs": [], "source": [ "python_repl = PythonREPL()\n", "\n", "repl_tool = Tool(\n", " name=\"python_repl\",\n", " description=\"\"\"A Python shell. Shell can dislay charts too. Use this to execute python commands.\\\n", " You have access to all libraries in python including but not limited to sklearn, pandas, numpy,\\\n", " matplotlib.pyplot, seaborn etc. Input should be a valid python command. If the user has not explicitly\\\n", " asked you to plot the results, always print the final output using print(...)\"\"\",\n", " func=python_repl.run,\n", ")\n", "\n", "tools = [repl_tool]" ] }, { "cell_type": "markdown", "id": "c5caa356", "metadata": {}, "source": [ "# Building an Agent " ] }, { "cell_type": "code", "execution_count": 6, "id": "44c07305", "metadata": {}, "outputs": [], "source": [ "prompt = ChatPromptTemplate.from_messages(\n", " [\n", " (\n", " \"system\",\n", " \"\"\"You are Machine Learning Inference agent. Your job is to use your tools to answer a user query\\\n", " in the best manner possible.\\\n", " Provide no explanation for your code. Enclose all your code between triple backticks ``` \"\"\",\n", " ),\n", " (\"user\", \"Dataframe named df:\\n{df}\\nQuery: {input}\\nList of Tools: {tools}\"),\n", " MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n", " ]\n", ")" ] }, { "cell_type": "code", "execution_count": 7, "id": "67dae2d6", "metadata": {}, "outputs": [], "source": [ "agent = (\n", " {\n", " \"input\": lambda x: x[\"input\"],\n", " \"tools\": lambda x:x['tools'],\n", " \"df\": lambda x:x['df'],\n", " \"agent_scratchpad\": lambda x: format_to_openai_tool_messages(\n", " x[\"intermediate_steps\"]\n", " )\n", " }\n", " | prompt\n", " | llm\n", " | OpenAIToolsAgentOutputParser()\n", ")" ] }, { "cell_type": "code", "execution_count": 8, "id": "45129995", "metadata": {}, "outputs": [], "source": [ "agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)" ] }, { "cell_type": "code", "execution_count": 9, "id": "9602539f", "metadata": {}, "outputs": [], "source": [ "# building an execution chain\n", "\n", "def infer(user_input):\n", " # fetch the response from the agent\n", " result = list(agent_executor.stream({\"input\": user_input, \n", " \"df\":pd.read_csv('df.csv'), \"tools\":tools}))\n", " \n", " # need to extract the code\n", " pattern = r\"```python\\n(.*?)\\n```\"\n", " matches = re.findall(pattern, result[0]['output'], re.DOTALL)\n", " final_line = \"df.to_csv('./df.csv', index=False)\"\n", " code = \"\\n\".join(matches)\n", " exec(code)\n", " try:\n", " exec(\"df.to_csv('./df.csv', index=False)\")\n", " except:\n", " pass\n", " # execute the code\n", " return None" ] }, { "cell_type": "markdown", "id": "2b119c18", "metadata": {}, "source": [ "# Testing the Agent" ] }, { "cell_type": "markdown", "id": "786a9a3e", "metadata": {}, "source": [ "### First we perform some data cleaning. Observe how each infer call updates the df and passes it onwards." ] }, { "cell_type": "code", "execution_count": 10, "id": "fbef0fff", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "3 | \n", "Braund, Mr. Owen Harris | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "A/5 21171 | \n", "7.25 | \n", "NaN | \n", "S | \n", "