agents-course
/

notebooks

Model card Files Files and versions Community

File size: 11,142 Bytes

{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# Agent\n",
    "\n",
    "In this notebook, **we're going to build a simple agent using using LangGraph**.\n",
    "\n",
    "This notebook is part of the <a href=\"https://www.hf.co/learn/agents-course\">Hugging Face Agents Course</a>, a free course from beginner to expert, where you learn to build Agents.\n",
    "\n",
    "![Agents course share](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png)\n",
    "\n",
    "As seen in the Unit 1, an agent needs 3 steps as introduced in the ReAct architecture :\n",
    "[ReAct](https://react-lm.github.io/), a general agent architecture.\n",
    "\n",
    "* `act` - let the model call specific tools\n",
    "* `observe` - pass the tool output back to the model\n",
    "* `reason` - let the model reason about the tool output to decide what to do next (e.g., call another tool or just respond directly)\n",
    "\n",
    "\n",
    "![Agent](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/Agent.png)"
   ],
   "id": "89791f21c171372a"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": "%pip install -q -U langchain_openai langchain_core langgraph",
   "id": "bef6c5514bd263ce"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "import os\n",
    "\n",
    "# Please setp your own key.\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-xxxxxx\""
   ],
   "id": "61d0ed53b26fa5c6"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "import base64\n",
    "from langchain_core.messages import HumanMessage\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "vision_llm = ChatOpenAI(model=\"gpt-4o\")\n",
    "\n",
    "\n",
    "def extract_text(img_path: str) -> str:\n",
    "    \"\"\"\n",
    "    Extract text from an image file using a multimodal model.\n",
    "\n",
    "    Args:\n",
    "        img_path: A local image file path (strings).\n",
    "\n",
    "    Returns:\n",
    "        A single string containing the concatenated text extracted from each image.\n",
    "    \"\"\"\n",
    "    all_text = \"\"\n",
    "    try:\n",
    "\n",
    "        # Read image and encode as base64\n",
    "        with open(img_path, \"rb\") as image_file:\n",
    "            image_bytes = image_file.read()\n",
    "\n",
    "        image_base64 = base64.b64encode(image_bytes).decode(\"utf-8\")\n",
    "\n",
    "        # Prepare the prompt including the base64 image data\n",
    "        message = [\n",
    "            HumanMessage(\n",
    "                content=[\n",
    "                    {\n",
    "                        \"type\": \"text\",\n",
    "                        \"text\": (\n",
    "                            \"Extract all the text from this image. \"\n",
    "                            \"Return only the extracted text, no explanations.\"\n",
    "                        ),\n",
    "                    },\n",
    "                    {\n",
    "                        \"type\": \"image_url\",\n",
    "                        \"image_url\": {\n",
    "                            \"url\": f\"data:image/png;base64,{image_base64}\"\n",
    "                        },\n",
    "                    },\n",
    "                ]\n",
    "            )\n",
    "        ]\n",
    "\n",
    "        # Call the vision-capable model\n",
    "        response = vision_llm.invoke(message)\n",
    "\n",
    "        # Append extracted text\n",
    "        all_text += response.content + \"\\n\\n\"\n",
    "\n",
    "        return all_text.strip()\n",
    "    except Exception as e:\n",
    "        # You can choose whether to raise or just return an empty string / error message\n",
    "        error_msg = f\"Error extracting text: {str(e)}\"\n",
    "        print(error_msg)\n",
    "        return \"\"\n",
    "\n",
    "\n",
    "llm = ChatOpenAI(model=\"gpt-4o\")\n",
    "\n",
    "\n",
    "def divide(a: int, b: int) -> float:\n",
    "    \"\"\"Divide a and b.\"\"\"\n",
    "    return a / b\n",
    "\n",
    "\n",
    "tools = [\n",
    "    divide,\n",
    "    extract_text\n",
    "]\n",
    "llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=False)"
   ],
   "id": "a4a8bf0d5ac25a37"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "Let's create our LLM and prompt it with the overall desired agent behavior.",
   "id": "3e7c17a2e155014e"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "from typing import TypedDict, Annotated, Optional\n",
    "from langchain_core.messages import AnyMessage\n",
    "from langgraph.graph.message import add_messages\n",
    "\n",
    "\n",
    "class AgentState(TypedDict):\n",
    "    # The input document\n",
    "    input_file: Optional[str]  # Contains file path, type (PNG)\n",
    "    messages: Annotated[list[AnyMessage], add_messages]"
   ],
   "id": "f31250bc1f61da81"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "from langchain_core.messages import HumanMessage, SystemMessage\n",
    "from langchain_core.utils.function_calling import convert_to_openai_tool\n",
    "\n",
    "\n",
    "def assistant(state: AgentState):\n",
    "    # System message\n",
    "    textual_description_of_tool = \"\"\"\n",
    "extract_text(img_path: str) -> str:\n",
    "    Extract text from an image file using a multimodal model.\n",
    "\n",
    "    Args:\n",
    "        img_path: A local image file path (strings).\n",
    "\n",
    "    Returns:\n",
    "        A single string containing the concatenated text extracted from each image.\n",
    "divide(a: int, b: int) -> float:\n",
    "    Divide a and b\n",
    "\"\"\"\n",
    "    image = state[\"input_file\"]\n",
    "    sys_msg = SystemMessage(content=f\"You are an helpful agent that can analyse some images and run some computatio without provided tools :\\n{textual_description_of_tool} \\n You have access to some otpional images. Currently the loaded images is : {image}\")\n",
    "\n",
    "    return {\"messages\": [llm_with_tools.invoke([sys_msg] + state[\"messages\"])], \"input_file\": state[\"input_file\"]}"
   ],
   "id": "3c4a736f9e55afa9"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "We define a `tools` node with our list of tools.\n",
    "\n",
    "The `assistant` node is just our model with bound tools.\n",
    "\n",
    "We create a graph with `assistant` and `tools` nodes.\n",
    "\n",
    "We add `tools_condition` edge, which routes to `End` or to `tools` based on  whether the `assistant` calls a tool.\n",
    "\n",
    "Now, we add one new step:\n",
    "\n",
    "We connect the `tools` node *back* to the `assistant`, forming a loop.\n",
    "\n",
    "* After the `assistant` node executes, `tools_condition` checks if the model's output is a tool call.\n",
    "* If it is a tool call, the flow is directed to the `tools` node.\n",
    "* The `tools` node connects back to `assistant`.\n",
    "* This loop continues as long as the model decides to call tools.\n",
    "* If the model response is not a tool call, the flow is directed to END, terminating the process."
   ],
   "id": "6f1efedd943d8b1d"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "from langgraph.graph import START, StateGraph\n",
    "from langgraph.prebuilt import ToolNode, tools_condition\n",
    "from IPython.display import Image, display\n",
    "\n",
    "# Graph\n",
    "builder = StateGraph(AgentState)\n",
    "\n",
    "# Define nodes: these do the work\n",
    "builder.add_node(\"assistant\", assistant)\n",
    "builder.add_node(\"tools\", ToolNode(tools))\n",
    "\n",
    "# Define edges: these determine how the control flow moves\n",
    "builder.add_edge(START, \"assistant\")\n",
    "builder.add_conditional_edges(\n",
    "    \"assistant\",\n",
    "    # If the latest message (result) from assistant is a tool call -> tools_condition routes to tools\n",
    "    # If the latest message (result) from assistant is a not a tool call -> tools_condition routes to END\n",
    "    tools_condition,\n",
    ")\n",
    "builder.add_edge(\"tools\", \"assistant\")\n",
    "react_graph = builder.compile()\n",
    "\n",
    "# Show\n",
    "display(Image(react_graph.get_graph(xray=True).draw_mermaid_png()))"
   ],
   "id": "e013061de784638a"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "messages = [HumanMessage(content=\"Divide 6790 by 5\")]\n",
    "\n",
    "messages = react_graph.invoke({\"messages\": messages, \"input_file\": None})"
   ],
   "id": "d3b0ba5be1a54aad"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "for m in messages['messages']:\n",
    "    m.pretty_print()"
   ],
   "id": "55eb0f1afd096731"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Training program\n",
    "MR Wayne left a note with his training program for the week. I came up with a recipe for dinner leaft in a note.\n",
    "\n",
    "you can find the document [HERE](https://huggingface.co/datasets/agents-course/course-images/blob/main/en/unit2/LangGraph/Batman_training_and_meals.png), so download it and upload it in the local folder.\n",
    "\n",
    "![Training](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/Batman_training_and_meals.png)"
   ],
   "id": "e0062c1b99cb4779"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "messages = [HumanMessage(content=\"According the note provided by MR wayne in the provided images. What's the list of items I should buy for the dinner menu ?\")]\n",
    "\n",
    "messages = react_graph.invoke({\"messages\": messages, \"input_file\": \"Batman_training_and_meals.png\"})"
   ],
   "id": "2e166ebba82cfd2a"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "for m in messages['messages']:\n",
    "    m.pretty_print()"
   ],
   "id": "5bfd67af70b7dcf3"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": "",
   "id": "8cd664ab5ee5450e"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}