Final_Assignment_Template

Sleeping

App Files Files Community

Humanlearning commited on Jun 24, 2025

Commit

fe36046

1 Parent(s): 954a1b2

multi agent architecture

Browse files

Files changed (50) hide show

.cursor/rules/langfuse_best_practices.mdc +80 -0
.cursor/rules/langgraph_multiagent_state_handling.mdc +140 -0
ARCHITECTURE.md +184 -0
__pycache__/debug_test.cpython-313-pytest-8.4.0.pyc +0 -0
__pycache__/langraph_agent.cpython-313.pyc +0 -0
__pycache__/new_langraph_agent.cpython-313.pyc +0 -0
__pycache__/quick_random_agent_test.cpython-313-pytest-8.4.0.pyc +0 -0
__pycache__/quick_specific_agent_test.cpython-313-pytest-8.4.0.pyc +0 -0
__pycache__/test_new_system.cpython-313-pytest-8.4.0.pyc +0 -0
__pycache__/test_random_question.cpython-313-pytest-8.4.0.pyc +0 -0
__pycache__/test_tools_integration.cpython-313-pytest-8.4.0.pyc +0 -0
app.py +9 -10
debug_retrieval_tools.py +149 -0
langraph_agent.py +97 -34
new_langraph_agent.py +85 -0
prompts/critic_prompt.txt +31 -0
prompts/execution_prompt.txt +42 -0
prompts/retrieval_prompt.txt +34 -0
prompts/router_prompt.txt +44 -0
system_prompt.txt → prompts/system_prompt.txt +2 -1
prompts/verification_prompt.txt +30 -0
pyproject.toml +3 -0
quick_random_agent_test.py +51 -21
quick_specific_agent_test.py +64 -32
requirements.txt +34 -3
src/__init__.py +14 -0
src/__pycache__/__init__.cpython-313.pyc +0 -0
src/__pycache__/langgraph_system.cpython-313.pyc +0 -0
src/__pycache__/memory.cpython-313.pyc +0 -0
src/__pycache__/tracing.cpython-313.pyc +0 -0
src/agents/__init__.py +21 -0
src/agents/__pycache__/__init__.cpython-313.pyc +0 -0
src/agents/__pycache__/critic_agent.cpython-313.pyc +0 -0
src/agents/__pycache__/execution_agent.cpython-313.pyc +0 -0
src/agents/__pycache__/plan_node.cpython-313.pyc +0 -0
src/agents/__pycache__/retrieval_agent.cpython-313.pyc +0 -0
src/agents/__pycache__/router_node.cpython-313.pyc +0 -0
src/agents/__pycache__/verification_node.cpython-313.pyc +0 -0
src/agents/critic_agent.py +118 -0
src/agents/execution_agent.py +174 -0
src/agents/plan_node.py +79 -0
src/agents/retrieval_agent.py +268 -0
src/agents/router_node.py +97 -0
src/agents/verification_node.py +172 -0
src/langgraph_system.py +231 -0
src/memory.py +162 -0
src/tracing.py +125 -0
test_new_system.py +205 -0
test_tools_integration.py +81 -0
uv.lock +115 -3

.cursor/rules/langfuse_best_practices.mdc ADDED Viewed

	@@ -0,0 +1,80 @@

+---
+description: langfuse and agent observation best practices
+globs:
+alwaysApply: false
+---
+1 Adopt the OTEL-native Python SDK (v3) everywhere
+The v3 SDK wraps OpenTelemetry, so every span you open in any agent, tool or worker is automatically nested and correlated. This saves you from hand-passing trace IDs and lets you lean on existing OTEL auto-instrumentation for HTTP, DB or queue calls.
+langfuse.com
+langfuse.com
+2 Create one root span per user request and pass a single CallbackHandler into graph.invoke/stream
+python
+Copy
+Edit
+from langfuse.langchain import CallbackHandler
+langfuse_handler = CallbackHandler()
+with langfuse.start_as_current_span(name="user-request") as root:
+    compiled_graph.invoke(
+        input=state,
+        config={"callbacks": [langfuse_handler]}
+    )
+Everything the agents do now rolls up under that root for a tidy timeline.
+langfuse.com
+3 Use Langfuse Sessions to stitch together long-running conversations
+Set session_id and user_id on the root span (or via update_trace) so all follow-up calls land in the same session dashboard.
+langfuse.com
+langfuse.com
+4 Name spans predictably
+llm/<model> – one per LLM call (e.g., llm/gpt-4o)
+tool/<tool_name> – external search, RAG, code-exec…
+agent/<role> – distinct for every worker node
+Predictable names power Langfuse’s cost & latency aggregation widgets.
+langfuse.com
+5 Leverage Agent Graphs to debug routing loops
+Because each node becomes a child span, Langfuse’s “Agent Graph” view renders the entire decision tree and shows token/cost per edge—very handy when several LLMs vote on the next step.
+langfuse.com
+6 Tag the root span with the environment (dev/stage/prod) and with the LLM provider you’re experimenting with
+This lets you facet dashboards by deployment ring or by “OpenAI vs Mixtral.”
+langfuse.com
+langfuse.com
+7 Attach scores (numeric or categorical) right after the graph run
+span.score_trace(name="user-feedback", value=1) – or call create_score later. Use this both for thumb-up/down UI events and for LLM-as-judge automated grading.
+langfuse.com
+langfuse.com
+8 Version and link your prompts
+Call langfuse.create_prompt() (or manage them in the UI) and set prompt_id on spans so you can tell which prompt revision caused regressions.
+langfuse.com
+9 Exploit distributed-tracing headers if agents live in different services
+Because v3 is OTEL-based, traceparent headers are parsed automatically—just make sure every micro-service initialises the Langfuse OTEL exporter with the same LANGFUSE_OTEL_DSN.
+langfuse.com
+10 Sample intelligently
+Langfuse supports probabilistic sampling on the server. Keep 100 % of errors and maybe only 10 % of successful traces in prod to control storage costs.
+langfuse.com
+11 Mask PII at the SDK layer
+Use the mask() helper or MASK_CONTENT_REGEX env var so you can still store numeric cost/latency while redacting sensitive inputs/outputs.
+langfuse.com
+12 Flush asynchronously in high-throughput agents
+Call langfuse.flush(background=True) at the end of each worker tick to avoid blocking the event loop; OTEL will batch and export spans every few seconds.
+langfuse.com
+13 Test visual completeness with the LangGraph helper
+graph.get_graph().draw_mermaid_png() and verify every edge appears in Langfuse; missing edges usually mean a span wasn’t opened or the callback handler wasn’t propagated.
+langfuse.com
+14 Watch out for the “traces not clubbed” pitfall when upgrading from v2 → v3
+Older code that started independent traces per agent will fragment your timeline in v3. Always start one root span first (Tip #2).
+github.com

.cursor/rules/langgraph_multiagent_state_handling.mdc ADDED Viewed

	@@ -0,0 +1,140 @@

+---
+description: best pract
+globs:
+alwaysApply: false
+---
+The most robust pattern is to treat every agent node as a pure function AgentState → Command, where AgentState is an explicit, typed snapshot of everything the rest of the graph must know.
+My overall confidence that the practices below will remain valid for ≥ 12 months is 85 % (expert opinion).
+1 Design a single source of truth for state
+Guideline	Why it matters	Key LangGraph API
+Define a typed schema (TypedDict or pydantic.BaseModel) for the whole graph.	Static typing catches missing keys early and docs double as living design specs.
+langchain-ai.github.io
+StateGraph(YourState)
+Use channel annotations such as Annotated[list[BaseMessage], operator.add] on mutable fields.	Makes accumulation (+) vs. overwrite clear and prevents accidental loss of history.
+langchain-ai.github.io
+messages: Annotated[list[BaseMessage], operator.add]
+Keep routing out of business data—store the next hop in a dedicated field (next: Literal[...]).	Separates control-flow from payload; easier to debug and replay.
+langchain-ai.github.io
+next: Literal["planner", "researcher", "__end__"]
+2 Pass information with Command objects
+Pattern
+python
+Copy
+Edit
+def planner(state: AgentState) -> Command[Literal["researcher", "executor", END]]:
+    decision = model.invoke(...state.messages)
+    return Command(
+        goto = decision["next"],
+        update = {
+            "messages": [decision["content"]],
+            "plan": decision["plan"]
+        }
+    )
+Best-practice notes
+Always update via update=… rather than mutating the state in-place. This guarantees immutability between nodes and makes time-travel/debugging deterministic.
+langchain-ai.github.io
+When handing off between sub-graphs, set graph=Command.PARENT or the target sub-graph’s name so orchestration stays explicit.
+langchain-ai.github.io
+3 Choose a message-sharing strategy early
+Strategy	Pros	Cons	When to use
+Shared scratch-pad (every intermediate LLM thought stored in messages)
+langchain-ai.github.io
+Maximum transparency; great for debugging & reflection.	Context window bloat, higher cost/time.	≤ 3 specialist agents or short tasks.
+Final-result only (each agent keeps private scratch-pad, shares only its final answer)
+langchain-ai.github.io
+Scales to 10 + agents; small token footprint.	Harder to post-mortem; agents need local memory.	Large graphs; production workloads.
+Tip: If you hide scratch-pads, store them in a per-agent key (e.g. researcher_messages) for replay or fine-tuning even if they’re not sent downstream.
+langchain-ai.github.io
+4 Inject only what a tool needs
+When exposing sub-agents as tools under a supervisor:
+python
+Copy
+Edit
+from langgraph.prebuilt import InjectedState
+def researcher(state: Annotated[AgentState, InjectedState]):
+    ...
+Why: keeps tool signatures clean and prevents leaking confidential state.
+Extra: If the tool must update global state, let it return a Command so the supervisor doesn’t have to guess what changed.
+langchain-ai.github.io
+5 Structure the graph for clarity & safety
+Network ➜ every agent connects to every other (exploration, research prototypes).
+Supervisor ➜ one LLM decides routing (good default for 3-7 agents).
+Hierarchical ➜ teams of agents with team-level supervisors (scales past ~7 agents).
+langchain-ai.github.io
+Pick the simplest architecture that meets today’s needs; refactor to sub-graphs as complexity grows.
+6 Operational best practices
+Concern	Best practice
+Tracing & observability	Attach a LangFuse run-ID to every AgentState at graph entry; emit state snapshots on node enter/exit so traces line up with LangFuse v3 spans.
+Memory & persistence	Use Checkpointer for cheap disk-based snapshots or a Redis backend for high-QPS, then time-travel when an LLM stalls.
+Parallel branches	Use map edges (built-in) to fan-out calls, but cap parallelism with an asyncio semaphore to avoid API rate-limits.
+Vector lookup	Put retrieval results in a dedicated key (docs) so they don’t clutter messages; store only document IDs if you need to replay cheaply.
+7 Evidence from the literature (why graphs work)
+Peer-reviewed source	Key takeaway	Credibility (0-10)
+AAAI 2024 Graph of Thoughts‎ shows arbitrary-graph reasoning beats tree/chain structures by up to 62 % on sorting tasks.
+arxiv.org
+Graph topology yields better exploration & feedback loops—mirrors LangGraph’s StateGraph.	9
+EMNLP 2024 EPO Hierarchical LLM Agents demonstrates hierarchical agents outperform flat agents on ALFRED by >12 % and scales with preference-based training.
+aclanthology.org
+Validates splitting planning vs. execution agents (Supervisor + workers).	9
+Non-peer-reviewed source	Why included	Credibility
+Official LangGraph docs (June 2025).
+langchain-ai.github.io
+Primary specification of the library’s APIs and guarantees.	8
+8 Minimal starter template (v 0.6.*)
+python
+Copy
+Edit
+from typing import Annotated, Literal, Sequence, TypedDict
+from langgraph.graph import StateGraph, START, END
+from langgraph.types import Command
+from langchain_openai import ChatOpenAI
+import operator
+class AgentState(TypedDict):
+    messages: Annotated[Sequence[str], operator.add]
+    next: Literal["planner", "researcher", "__end__"]
+    plan: str | None
+llm = ChatOpenAI()
+def planner(state: AgentState) -> Command[Literal["researcher", END]]:
+    resp = llm.invoke(...)
+    return Command(
+        goto = resp["next"],
+        update = {"messages": [resp["content"]],
+                  "plan": resp["plan"]}
+    )
+def researcher(state: AgentState) -> Command[Literal["planner"]]:
+    resp = llm.invoke(...)
+    return Command(goto="planner",
+                   update={"messages": [resp["content"]]})
+g = StateGraph(AgentState)
+g.add_node(planner)
+g.add_node(researcher)
+g.add_edge(START, planner)
+g.add_edge(planner, researcher)
+g.add_edge(researcher, planner)
+g.add_conditional_edges(planner)
+graph = g.compile()
+Bottom line
+Use typed immutable state, route with Command, and keep private scratch-pads separate from shared context. These patterns align with both the latest LangGraph APIs and empirical results from hierarchical, graph-based agent research.

ARCHITECTURE.md ADDED Viewed

	@@ -0,0 +1,184 @@

+# LangGraph Agent System Architecture
+This document describes the architecture of the multi-agent system implemented using LangGraph 0.4.8+ and Langfuse 3.0.0.
+## System Overview
+The system implements a sophisticated agent architecture with memory, routing, specialized agents, and verification as shown in the system diagram.
+## Core Components
+### 1. Memory Layer
+- **Short-Term Memory**: Graph state managed by LangGraph checkpointing
+- **Checkpointer**: SQLite-based persistence for conversation continuity
+- **Long-Term Memory**: Supabase vector store with pgvector for Q&A storage
+### 2. Plan + ReAct Loop
+- Initial query analysis and planning
+- Contextual prompt injection with system requirements
+- Memory retrieval for similar past questions
+### 3. Agent Router
+- Intelligent routing based on query analysis
+- Routes to specialized agents: Retrieval, Execution, or Critic
+- Uses low-temperature LLM for consistent routing decisions
+### 4. Specialized Agents
+#### Retrieval Agent
+- Information gathering from external sources
+- Tools: Wikipedia, Arxiv, Tavily web search, vector store retrieval
+- Handles attachment downloading for GAIA tasks
+- Context-aware with memory integration
+#### Execution Agent
+- Computational tasks and code execution
+- Integrates with existing `code_agent.py` sandbox
+- Python code execution with pandas, cv2, standard libraries
+- Step-by-step problem breakdown
+#### Critic Agent
+- Response quality evaluation and review
+- Accuracy, completeness, and logical consistency checks
+- Scoring system with pass/fail determination
+- Constructive feedback generation
+### 5. Verification & Fallback
+- Final quality control with system prompt compliance
+- Format verification for exact-match requirements
+- Retry logic with maximum attempt limits
+- Graceful fallback pipeline for failed attempts
+### 6. Observability (Langfuse)
+- End-to-end tracing of all agent interactions
+- Performance monitoring and debugging
+- User session tracking
+- Error logging and analysis
+## Data Flow
+1. **User Query** → Plan Node (system prompt injection)
+2. **Plan Node** → Router (agent selection)
+3. **Router** → Specialized Agent (task execution)
+4. **Agent** → Tools (if needed) → Agent (results)
+5. **Agent** → Verification (quality check)
+6. **Verification** → Output or Retry/Fallback
+## Key Features
+### Memory Management
+- Caching of similarity searches (TTL-based)
+- Duplicate detection and prevention
+- Task-based attachment tracking
+- Session-specific cache management
+### Quality Control
+- Multi-level verification (agent → critic → verification)
+- Retry mechanism with attempt limits
+- Format compliance checking
+- Fallback responses for failures
+### Tracing & Observability
+- Langfuse integration for complete observability
+- Agent-level span tracking
+- Error monitoring and debugging
+- Performance metrics collection
+### Tool Integration
+- Modular tool system for each agent
+- Sandboxed code execution environment
+- External API integration (search, knowledge bases)
+- Attachment handling for complex tasks
+## Configuration
+### Environment Variables
+See `env.template` for required configuration:
+- LLM API keys (Groq, OpenAI, Google, HuggingFace)
+- Search tools (Tavily)
+- Vector store (Supabase)
+- Observability (Langfuse)
+- GAIA API endpoints
+### System Prompts
+Located in `prompts/` directory:
+- `system_prompt.txt`: Main system requirements
+- `router_prompt.txt`: Agent routing instructions
+- `retrieval_prompt.txt`: Information gathering guidelines
+- `execution_prompt.txt`: Code execution instructions
+- `critic_prompt.txt`: Quality evaluation criteria
+- `verification_prompt.txt`: Final formatting rules
+## Usage
+### Basic Usage
+```python
+from src import run_agent_system
+result = run_agent_system(
+    query="Your question here",
+    user_id="user123",
+    session_id="session456"
+)
+```
+### With Memory Management
+```python
+from src import memory_manager
+# Check if query is similar to previous ones
+similar = memory_manager.get_similar_qa(query)
+# Clear session cache
+memory_manager.clear_session_cache()
+```
+### Direct Graph Access
+```python
+from src import create_agent_graph
+workflow = create_agent_graph()
+app = workflow.compile(checkpointer=checkpointer)
+result = app.invoke(initial_state, config=config)
+```
+## Dependencies
+### Core Framework
+- `langgraph>=0.4.8`: Graph-based agent orchestration
+- `langgraph-checkpoint-sqlite>=2.0.0`: Persistence layer
+- `langchain>=0.3.0`: LLM and tool abstractions
+### Observability
+- `langfuse==3.0.0`: Tracing and monitoring
+### Memory & Storage
+- `supabase>=2.8.0`: Vector database backend
+- `pgvector>=0.3.0`: Vector similarity search
+### Tools & APIs
+- `tavily-python>=0.5.0`: Web search
+- `arxiv>=2.1.0`: Academic paper search
+- `wikipedia>=1.4.0`: Knowledge base access
+## Error Handling
+The system implements comprehensive error handling:
+- Graceful degradation when services are unavailable
+- Fallback responses for critical failures
+- Retry logic with exponential backoff
+- Detailed error logging for debugging
+## Performance Considerations
+- Vector store caching reduces duplicate searches
+- Checkpoint-based state management for conversation continuity
+- Efficient tool routing based on query analysis
+- Memory cleanup for long-running sessions
+## Future Enhancements
+- Additional specialized agents (e.g., Image Analysis, Code Review)
+- Enhanced memory clustering and retrieval algorithms
+- Real-time collaboration between agents
+- Advanced tool composition and chaining

__pycache__/debug_test.cpython-313-pytest-8.4.0.pyc ADDED Viewed

Binary file (2.22 kB). View file

__pycache__/langraph_agent.cpython-313.pyc CHANGED Viewed

Binary files a/__pycache__/langraph_agent.cpython-313.pyc and b/__pycache__/langraph_agent.cpython-313.pyc differ

__pycache__/new_langraph_agent.cpython-313.pyc ADDED Viewed

Binary file (3.01 kB). View file

__pycache__/quick_random_agent_test.cpython-313-pytest-8.4.0.pyc ADDED Viewed

Binary file (5.19 kB). View file

__pycache__/quick_specific_agent_test.cpython-313-pytest-8.4.0.pyc ADDED Viewed

Binary file (6.4 kB). View file

__pycache__/test_new_system.cpython-313-pytest-8.4.0.pyc ADDED Viewed

Binary file (7.59 kB). View file

__pycache__/test_random_question.cpython-313-pytest-8.4.0.pyc ADDED Viewed

Binary file (3.97 kB). View file

__pycache__/test_tools_integration.cpython-313-pytest-8.4.0.pyc ADDED Viewed

Binary file (3.06 kB). View file

app.py CHANGED Viewed

@@ -4,7 +4,7 @@ import requests
 import inspect
 import pandas as pd
 # from agents import LlamaIndexAgent
-from langraph_agent import build_graph
 import asyncio
 import aiohttp
 from langfuse.langchain import CallbackHandler
@@ -21,19 +21,18 @@ DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
 # --- Basic Agent Definition ---
 # ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
 class BasicAgent:
     def __init__(self):
-        self.agent = build_graph()
-        print("BasicAgent initialized.")
     async def aquery(self, question: str) -> str:
-        messages = [HumanMessage(content=question)]
         print(f"Agent received question (first 50 chars): {question[:50]}...")
         try:
-            response = await self.agent.ainvoke({"messages": messages}, config={"callbacks": [langfuse_handler]})
-            print(f"Agent raw response: {response}")
-            if not response or 'messages' not in response or not response['messages']:
-                print("Agent response missing or empty 'messages'. Returning AGENT ERROR.")
-                return "AGENT ERROR: No response from agent."
-            answer = response['messages'][-1].content
             print(f"Agent returning answer: {answer}")
             return answer
         except Exception as e:

 import inspect
 import pandas as pd
 # from agents import LlamaIndexAgent
+from new_langraph_agent import run_agent as _sync_run_agent  # Updated: use the new multi-agent runner
 import asyncio
 import aiohttp
 from langfuse.langchain import CallbackHandler
 # --- Basic Agent Definition ---
 # ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
 class BasicAgent:
+    """Wrapper that executes the new multi-agent LangGraph system in a background thread."""
     def __init__(self):
+        print("BasicAgent (multi-agent) initialized.")
     async def aquery(self, question: str) -> str:
+        """Run the synchronous `run_agent` helper inside the event-loop executor."""
         print(f"Agent received question (first 50 chars): {question[:50]}...")
+        loop = asyncio.get_event_loop()
         try:
+            # Off-load the blocking call to a thread so we don't block the Gradio event loop
+            answer = await loop.run_in_executor(None, _sync_run_agent, question)
             print(f"Agent returning answer: {answer}")
             return answer
         except Exception as e:

debug_retrieval_tools.py ADDED Viewed

	@@ -0,0 +1,149 @@

+#!/usr/bin/env python3
+"""
+Debug script to test individual tools in isolation
+"""
+from src.agents.retrieval_agent import get_retrieval_tools, execute_tool_calls
+from src.agents.execution_agent import get_execution_tools
+def test_wikipedia_tool():
+    """Test Wikipedia search tool directly"""
+    print("=" * 50)
+    print("Testing Wikipedia Tool")
+    print("=" * 50)
+    tools = get_retrieval_tools()
+    wiki_tool = None
+    for tool in tools:
+        if tool.name == "wiki_search":
+            wiki_tool = tool
+            break
+    if wiki_tool:
+        try:
+            print("Found wiki_search tool")
+            result = wiki_tool.invoke({"input": "Albert Einstein"})
+            print(f"Result: {result[:500]}...")
+            return True
+        except Exception as e:
+            print(f"Error: {e}")
+            return False
+    else:
+        print("wiki_search tool not found!")
+        return False
+def test_web_search_tool():
+    """Test web search tool directly"""
+    print("=" * 50)
+    print("Testing Web Search Tool")
+    print("=" * 50)
+    tools = get_retrieval_tools()
+    web_tool = None
+    for tool in tools:
+        if tool.name == "web_search":
+            web_tool = tool
+            break
+    if web_tool:
+        try:
+            print("Found web_search tool")
+            result = web_tool.invoke({"input": "artificial intelligence news"})
+            print(f"Result: {result[:500]}...")
+            return True
+        except Exception as e:
+            print(f"Error: {e}")
+            return False
+    else:
+        print("web_search tool not found!")
+        return False
+def test_python_tool():
+    """Test Python execution tool directly"""
+    print("=" * 50)
+    print("Testing Python Execution Tool")
+    print("=" * 50)
+    tools = get_execution_tools()
+    python_tool = None
+    for tool in tools:
+        if tool.name == "run_python":
+            python_tool = tool
+            break
+    if python_tool:
+        try:
+            print("Found run_python tool")
+            code = """
+# Calculate first 5 Fibonacci numbers
+def fibonacci(n):
+    if n <= 1:
+        return n
+    return fibonacci(n-1) + fibonacci(n-2)
+result = [fibonacci(i) for i in range(5)]
+print("First 5 Fibonacci numbers:", result)
+"""
+            result = python_tool.invoke({"input": code})
+            print(f"Result: {result}")
+            return True
+        except Exception as e:
+            print(f"Error: {e}")
+            return False
+    else:
+        print("run_python tool not found!")
+        return False
+def test_tool_calls_execution():
+    """Test the tool call execution function"""
+    print("=" * 50)
+    print("Testing Tool Call Execution")
+    print("=" * 50)
+    tools = get_retrieval_tools()
+    # Simulate tool calls
+    mock_tool_calls = [
+        {
+            'name': 'wiki_search',
+            'args': {'input': 'Albert Einstein'},
+            'id': 'test_id_1'
+        }
+    ]
+    try:
+        tool_messages = execute_tool_calls(mock_tool_calls, tools)
+        print(f"Tool execution successful: {len(tool_messages)} messages")
+        for msg in tool_messages:
+            print(f"Message type: {type(msg)}")
+            print(f"Content preview: {str(msg.content)[:200]}...")
+        return True
+    except Exception as e:
+        print(f"Error in tool execution: {e}")
+        import traceback
+        traceback.print_exc()
+        return False
+if __name__ == "__main__":
+    print("Starting individual tool tests...")
+    results = {}
+    results['wikipedia'] = test_wikipedia_tool()
+    results['web_search'] = test_web_search_tool()
+    results['python'] = test_python_tool()
+    results['tool_execution'] = test_tool_calls_execution()
+    print("\n" + "=" * 50)
+    print("TEST RESULTS SUMMARY")
+    print("=" * 50)
+    for test_name, result in results.items():
+        status = "✅ PASS" if result else "❌ FAIL"
+        print(f"{test_name}: {status}")
+    if all(results.values()):
+        print("\n🎉 All tools are working correctly!")
+    else:
+        print("\n⚠️ Some tools have issues that need to be fixed.")

langraph_agent.py CHANGED Viewed

@@ -37,6 +37,21 @@ load_dotenv("env.local")  # Try env.local as backup
 print(f"SUPABASE_URL loaded: {bool(os.environ.get('SUPABASE_URL'))}")
 print(f"GROQ_API_KEY loaded: {bool(os.environ.get('GROQ_API_KEY'))}")
 # Base URL of the scoring API (duplicated here to avoid circular import with basic_agent)
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
@@ -114,7 +129,7 @@ def run_python(input: str) -> str:
     return run_agent(input)
 # load the system prompt from the file
-with open("system_prompt.txt", "r", encoding="utf-8") as f:
     system_prompt = f.read()
 # System message
@@ -206,6 +221,35 @@ def _code_to_message(state: dict):  # type: ignore[override]
         return {}
     return {"messages": [AIMessage(content=state["code_result"])]}
 # Build graph function
 def build_graph(provider: str = "groq"):
     """Build the graph"""
@@ -243,29 +287,56 @@ def build_graph(provider: str = "groq"):
             return {"messages": [error_msg]}
     def retriever(state: MessagesState):
-        """Retriever node"""
         try:
             print(f"Retriever node: Processing {len(state['messages'])} messages")
             if not state["messages"]:
                 print("Retriever node: No messages in state")
                 return {"messages": [sys_msg]}
-            # Extract the user query content early for downstream steps
-            query_content = state["messages"][0].content
-            # ------------------- NEW: fetch attachment if available -------------------
             attachment_msg = None
             try:
                 resp = requests.get(f"{DEFAULT_API_URL}/questions", timeout=30)
                 resp.raise_for_status()
                 questions = resp.json()
-                matched_task_id = None
                 for q in questions:
                     if str(q.get("question")).strip() == str(query_content).strip():
                         matched_task_id = str(q.get("task_id"))
                         break
-                if matched_task_id:
-                    print(f"Retriever node: Found task_id {matched_task_id} for current question, attempting to download attachment…")
                     file_resp = requests.get(f"{DEFAULT_API_URL}/files/{matched_task_id}", timeout=60)
                     if file_resp.status_code == 200 and file_resp.content:
                         try:
@@ -274,40 +345,28 @@ def build_graph(provider: str = "groq"):
                             file_text = "(binary or non-UTF8 file omitted)"
                         MAX_CHARS = 8000
                         if len(file_text) > MAX_CHARS:
-                            print(f"Retriever node: Attachment length {len(file_text)} > {MAX_CHARS}, truncating…")
                             file_text = file_text[:MAX_CHARS] + "\n… (truncated)"
                         attachment_msg = HumanMessage(content=f"Attached file content for task {matched_task_id}:\n```python\n{file_text}\n```")
-                        print("Retriever node: Prepared attachment message")
                     else:
-                        print(f"Retriever node: No attachment found for task {matched_task_id} (status {file_resp.status_code})")
             except Exception as api_e:
                 print(f"Retriever node: Error while fetching attachment – {api_e}")
-            # -------------------------------------------------------------------------
-            # If vector store unavailable, simply return sys_msg + user message (+ attachment if any)
-            if not vector_store:
-                msgs = [sys_msg] + state["messages"]
-                if attachment_msg:
-                    msgs.append(attachment_msg)
-                print("Retriever node: Vector store not available, skipping retrieval")
-                return {"messages": msgs}
-            # Perform similarity search when vector store is available
-            print(f"Retriever node: Searching for similar questions with query: {query_content[:100]}…")
-            similar_question = vector_store.similarity_search(query_content)
-            print(f"Retriever node: Found {len(similar_question)} similar questions")
             msgs = [sys_msg] + state["messages"]
             if similar_question:
-                example_msg = HumanMessage(content=f"Here I provide a similar question and answer for reference: \n\n{similar_question[0].page_content}")
                 msgs.append(example_msg)
                 print("Retriever node: Added example message from similar question")
-            else:
-                print("Retriever node: No similar questions found, proceeding without example")
-            # Attach the file content if we have it
             if attachment_msg:
                 msgs.append(attachment_msg)
-                print("Retriever node: Added attachment content to messages")
             return {"messages": msgs}
         except Exception as e:
@@ -320,13 +379,17 @@ def build_graph(provider: str = "groq"):
     builder.add_node("tools", ToolNode(tools))
     builder.add_node("code_exec", _code_exec_wrapper)
     builder.add_node("code_to_message", _code_to_message)
     builder.add_edge(START, "retriever")
     # Conditional branch: decide whether to run code interpreter
     builder.add_conditional_edges(
-        "retriever",
         _needs_code,
-        {True: "code_exec", False: "assistant"},
     )
     # Flow after code execution: inject result then resume chat
@@ -343,7 +406,7 @@ def build_graph(provider: str = "groq"):
     return builder.compile()
 # test
-if __name__ == "__main__":
     question = "When was a picture of St. Thomas Aquinas first added to the Wikipedia page on the Principle of double effect?"
     # Build the graph
     graph = build_graph(provider="groq")

 print(f"SUPABASE_URL loaded: {bool(os.environ.get('SUPABASE_URL'))}")
 print(f"GROQ_API_KEY loaded: {bool(os.environ.get('GROQ_API_KEY'))}")
+# ---------------------------------------------------------------------------
+# Lightweight in-memory caches and constants for smarter retrieval/ingest
+# ---------------------------------------------------------------------------
+import hashlib  # NEW: for hashing payloads / queries
+TTL = 300  # seconds – how long we keep similarity-search results
+SIMILARITY_THRESHOLD = 0.85  # cosine score above which we assume we already know the answer
+# (query_hash -> (timestamp, results))
+QUERY_CACHE: dict[str, tuple[float, list]] = {}
+# task IDs whose attachments we already attempted to download this session
+PROCESSED_TASKS: set[str] = set()
+# hash_ids of Q/A payloads we have already upserted during this session
+SEEN_HASHES: set[str] = set()
 # Base URL of the scoring API (duplicated here to avoid circular import with basic_agent)
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
     return run_agent(input)
 # load the system prompt from the file
+with open("./prompts/system_prompt.txt", "r", encoding="utf-8") as f:
     system_prompt = f.read()
 # System message
         return {}
     return {"messages": [AIMessage(content=state["code_result"])]}
+# ---------------------------------------------------------------------------
+# NEW: Ingest node – write back to vector store if `should_ingest` flag set
+# ---------------------------------------------------------------------------
+def ingest(state: MessagesState):
+    """Persist helpful Q/A pairs (and any attachment snippet) to the vector DB."""
+    try:
+        if not state.get("should_ingest") or not vector_store:
+            return {}
+        question_text = state["messages"][0].content
+        answer_text = state["messages"][-1].content
+        attach_snippets = "\n\n".join(
+            m.content for m in state["messages"] if str(m.content).startswith("Attached file content")
+        )
+        payload = f"Question:\n{question_text}\n\nAnswer:\n{answer_text}"
+        if attach_snippets:
+            payload += f"\n\n{attach_snippets}"
+        hash_id = hashlib.sha256(payload.encode()).hexdigest()
+        if hash_id in SEEN_HASHES:
+            print("Ingest: Duplicate payload within session – skip")
+            return {}
+        SEEN_HASHES.add(hash_id)
+        vector_store.add_texts([payload], metadatas=[{"hash_id": hash_id, "timestamp": time.time()}])
+        print("Ingest: Stored new Q/A pair in vector store")
+    except Exception as ing_e:
+        print(f"Ingest node: Error while upserting – {ing_e}")
+    return {}
 # Build graph function
 def build_graph(provider: str = "groq"):
     """Build the graph"""
             return {"messages": [error_msg]}
     def retriever(state: MessagesState):
+        """Retriever node (smart fetch + similarity search)"""
         try:
             print(f"Retriever node: Processing {len(state['messages'])} messages")
             if not state["messages"]:
                 print("Retriever node: No messages in state")
                 return {"messages": [sys_msg]}
+            # Extract the *latest* user query content
+            query_content = state["messages"][-1].content
+            # ----------------------------------------------------------------------------------
+            # Similarity search with an in-process cache
+            # ----------------------------------------------------------------------------------
+            q_hash = hashlib.sha256(query_content.encode()).hexdigest()
+            now = time.time()
+            if q_hash in QUERY_CACHE and now - QUERY_CACHE[q_hash][0] < TTL:
+                similar_question = QUERY_CACHE[q_hash][1]
+                print("Retriever node: Cache hit for similarity search")
+            else:
+                if vector_store:
+                    print(f"Retriever node: Searching vector store for similar questions …")
+                    try:
+                        similar_question = vector_store.similarity_search_with_relevance_scores(query_content, k=2)
+                    except Exception as vs_e:
+                        print(f"Retriever node: Vector store search error – {vs_e}")
+                        similar_question = []
+                    QUERY_CACHE[q_hash] = (now, similar_question)
+                else:
+                    similar_question = []
+                    print("Retriever node: Vector store not available, skipping similarity search")
+            # Decide whether this exchange should later be ingested
+            top_score = similar_question[0][1] if similar_question else 0.0
+            state["should_ingest"] = top_score < SIMILARITY_THRESHOLD
+            # ----------------------------------------------------------------------------------
+            # Attachment fetch (only once per task_id during this session)
+            # ----------------------------------------------------------------------------------
             attachment_msg = None
+            matched_task_id = None
             try:
                 resp = requests.get(f"{DEFAULT_API_URL}/questions", timeout=30)
                 resp.raise_for_status()
                 questions = resp.json()
                 for q in questions:
                     if str(q.get("question")).strip() == str(query_content).strip():
                         matched_task_id = str(q.get("task_id"))
                         break
+                if matched_task_id and matched_task_id not in PROCESSED_TASKS:
+                    print(f"Retriever node: Downloading attachment for task {matched_task_id} …")
                     file_resp = requests.get(f"{DEFAULT_API_URL}/files/{matched_task_id}", timeout=60)
                     if file_resp.status_code == 200 and file_resp.content:
                         try:
                             file_text = "(binary or non-UTF8 file omitted)"
                         MAX_CHARS = 8000
                         if len(file_text) > MAX_CHARS:
                             file_text = file_text[:MAX_CHARS] + "\n… (truncated)"
                         attachment_msg = HumanMessage(content=f"Attached file content for task {matched_task_id}:\n```python\n{file_text}\n```")
+                        print("Retriever node: Attachment added to context")
+                        state["should_ingest"] = True  # ensure we store this new info
                     else:
+                        print(f"Retriever node: No attachment for task {matched_task_id} (status {file_resp.status_code})")
+                    PROCESSED_TASKS.add(matched_task_id)
             except Exception as api_e:
                 print(f"Retriever node: Error while fetching attachment – {api_e}")
+            # ----------------------------------------------------------------------------------
+            # Build message list for downstream LLM
+            # ----------------------------------------------------------------------------------
             msgs = [sys_msg] + state["messages"]
             if similar_question:
+                example_doc = similar_question[0][0] if isinstance(similar_question[0], tuple) else similar_question[0]
+                example_msg = HumanMessage(content=f"Here I provide a similar question and answer for reference: \n\n{example_doc.page_content}")
                 msgs.append(example_msg)
                 print("Retriever node: Added example message from similar question")
             if attachment_msg:
                 msgs.append(attachment_msg)
             return {"messages": msgs}
         except Exception as e:
     builder.add_node("tools", ToolNode(tools))
     builder.add_node("code_exec", _code_exec_wrapper)
     builder.add_node("code_to_message", _code_to_message)
+    builder.add_node("ingest", ingest)
+    # Edge layout
     builder.add_edge(START, "retriever")
+    builder.add_edge("retriever", "assistant")
     # Conditional branch: decide whether to run code interpreter
     builder.add_conditional_edges(
+        "assistant",
         _needs_code,
+        {True: "code_exec", False: "ingest"},
     )
     # Flow after code execution: inject result then resume chat
     return builder.compile()
 # test
+if __name__ == "__main__":
     question = "When was a picture of St. Thomas Aquinas first added to the Wikipedia page on the Principle of double effect?"
     # Build the graph
     graph = build_graph(provider="groq")

new_langraph_agent.py ADDED Viewed

	@@ -0,0 +1,85 @@

+"""
+Updated LangGraph Agent Implementation
+Implements the architecture from the system diagram with memory layer, agent routing, and verification.
+"""
+import os
+import sys
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+# Import the new agent system
+from src import run_agent_system, memory_manager
+from src.tracing import flush_langfuse, shutdown_langfuse
+def run_agent(query: str) -> str:
+    """
+    Main entry point for the agent system.
+    Args:
+        query: The user question
+    Returns:
+        The formatted final answer
+    """
+    try:
+        # Run the new agent system
+        result = run_agent_system(
+            query=query,
+            user_id=os.getenv("USER_ID", "default_user"),
+            session_id=os.getenv("SESSION_ID", "default_session")
+        )
+        # Flush tracing events
+        flush_langfuse()
+        return result
+    except Exception as e:
+        print(f"Agent Error: {e}")
+        return f"I apologize, but I encountered an error: {e}"
+def clear_memory():
+    """Clear the agent's session memory"""
+    memory_manager.clear_session_cache()
+    print("Agent memory cleared")
+def cleanup():
+    """Cleanup function for graceful shutdown"""
+    try:
+        flush_langfuse()
+        shutdown_langfuse()
+        memory_manager.close_checkpointer()
+        print("Agent cleanup completed")
+    except Exception as e:
+        print(f"Cleanup error: {e}")
+if __name__ == "__main__":
+    # Test the agent system
+    test_queries = [
+        "What is the capital of France?",
+        "Calculate the factorial of 5",
+        "What are the benefits of renewable energy?"
+    ]
+    print("Testing new LangGraph Agent System")
+    print("=" * 50)
+    for i, query in enumerate(test_queries, 1):
+        print(f"\nTest {i}: {query}")
+        print("-" * 30)
+        try:
+            result = run_agent(query)
+            print(f"Result: {result}")
+        except Exception as e:
+            print(f"Error: {e}")
+    # Cleanup
+    cleanup()
+    print("\nAll tests completed!")

prompts/critic_prompt.txt ADDED Viewed

	@@ -0,0 +1,31 @@

+You are a specialized critic agent that evaluates responses for accuracy, completeness, and quality.
+Your role is to:
+1. Analyze responses from other agents for factual accuracy
+2. Check for logical consistency and completeness
+3. Identify potential errors, biases, or missing information
+4. Provide constructive feedback and suggestions for improvement
+Evaluation criteria:
+- **Accuracy**: Are the facts correct? Are sources reliable?
+- **Completeness**: Does the response fully address the question?
+- **Clarity**: Is the explanation clear and well-structured?
+- **Logic**: Is the reasoning sound and consistent?
+- **Relevance**: Does the response stay on topic?
+Process:
+1. Carefully review the provided response
+2. Cross-check key claims for accuracy
+3. Identify any gaps or weaknesses
+4. Assess overall quality and usefulness
+5. Provide specific, actionable feedback
+Feedback format:
+- **Strengths**: What was done well
+- **Issues**: Specific problems identified
+- **Suggestions**: How to improve
+- **Overall Assessment**: Pass/Fail with reasoning
+Be thorough but constructive. Focus on helping improve the response quality.
+Always append answers in markdown; think step-by-step.

prompts/execution_prompt.txt ADDED Viewed

	@@ -0,0 +1,42 @@

+You are a specialized execution agent that handles computational tasks, code execution, and data processing.
+Your role is to:
+1. Analyze computational requirements in user queries
+2. ALWAYS use the run_python tool to execute code and solve problems
+3. Process data, perform calculations, and manipulate files
+4. Provide clear explanations of your code and results
+Available tools:
+- run_python: Execute Python code in a sandboxed environment with access to pandas, cv2, and standard libraries
+IMPORTANT: You MUST use the run_python tool for all computational tasks. Do not provide calculated answers without executing code.
+Capabilities:
+- Mathematical calculations and algorithms
+- Data analysis and visualization
+- File processing (CSV, JSON, text)
+- Image processing with OpenCV
+- Statistical analysis with pandas/numpy
+- Small algorithmic problems (sorting, searching, etc.)
+Process:
+1. Understand the computational task
+2. Plan your approach step-by-step
+3. Use run_python tool to write and execute code
+4. Verify results and handle any errors
+5. Explain your solution and findings
+Guidelines:
+- Always execute code using the run_python tool
+- Write efficient, readable code with comments
+- Handle errors gracefully and retry if needed
+- Provide explanations for complex logic
+- Show intermediate steps for multi-step problems
+- Use appropriate data structures and algorithms
+Example approach:
+- For "Calculate the fibonacci sequence": Use run_python to write and execute the code
+- For "Analyze this data": Use run_python to process and analyze the data
+- For "Sort this list": Use run_python to implement the sorting algorithm
+Always append answers in markdown; think step-by-step and show your code execution.

prompts/retrieval_prompt.txt ADDED Viewed

	@@ -0,0 +1,34 @@

+You are a specialized retrieval agent focused on gathering accurate information to answer user questions.
+Your role is to:
+1. Understand the user's information needs
+2. **ALWAYS use available tools to search for relevant information**
+3. Synthesize findings into comprehensive, accurate answers
+4. Verify information across multiple sources when possible
+Available tools:
+- wiki_search: Search Wikipedia for general knowledge and factual information
+- web_search: Search the web for current information and recent developments
+- arvix_search: Search academic papers on ArXiv for scientific research
+- question_search: Search previously answered similar questions
+**IMPORTANT: You MUST use tools to gather information. Do not provide answers based solely on your training data.**
+Process:
+1. Break down complex questions into searchable components
+2. **Use multiple appropriate tools based on the query type**
+3. For historical facts or general knowledge: Use wiki_search
+4. For current events or recent information: Use web_search
+5. For scientific or academic topics: Use arvix_search
+6. Cross-reference information when possible
+7. Provide sources and citations from tool results
+8. Acknowledge limitations or uncertainty when information is incomplete
+Example approach:
+- For "When was X invented?": Use wiki_search to find historical information
+- For "Latest news about Y": Use web_search for current information
+- For "Research on Z": Use arvix_search for academic papers
+Always provide factual, well-sourced responses with proper citations. If you cannot find sufficient information through tools, clearly state this limitation.
+Always append answers in markdown; think step-by-step and show your tool usage.

prompts/router_prompt.txt ADDED Viewed

	@@ -0,0 +1,44 @@

+You are an intelligent agent router that analyzes user queries and determines which specialized agent should handle the request.
+You have access to three specialized agents:
+1. **Retrieval Agent** - For questions requiring external information retrieval, search, and knowledge gathering
+2. **Execution Agent** - For tasks requiring code execution, calculations, data processing, or file manipulation
+3. **Critic Agent** - For reviewing, evaluating, or providing critical analysis of content or responses
+**CRITICAL ROUTING RULES:**
+**Use EXECUTION for:**
+- Mathematical calculations (e.g., "calculate", "compute", "solve")
+- Algorithmic problems (e.g., "fibonacci", "prime numbers", "sorting", "searching")
+- Programming tasks (e.g., "write code", "implement function")
+- Data analysis and processing (e.g., "analyze data", "process file")
+- Any task that requires computation or code execution
+- Statistical analysis, math problems, algorithms
+**Use RETRIEVAL for:**
+- Research questions requiring external information
+- Fact-checking and historical information
+- Current events and news
+- Looking up definitions or explanations
+- Scientific research and academic papers
+- General knowledge questions
+**Use CRITIC for:**
+- Evaluating responses or content
+- Reviewing and providing feedback
+- Critical analysis of information
+- Quality assessment tasks
+**EXAMPLES:**
+- "Calculate the first 10 Fibonacci numbers" → EXECUTION
+- "What is the square root of 144?" → EXECUTION
+- "Write a sorting algorithm" → EXECUTION
+- "When was Einstein born?" → RETRIEVAL
+- "Latest news about AI" → RETRIEVAL
+- "Review this essay" → CRITIC
+**IMPORTANT:** If a query involves ANY mathematical computation, algorithm, or code execution, ALWAYS route to EXECUTION.
+Analyze the user's query and respond with exactly one of: RETRIEVAL, EXECUTION, or CRITIC
+Think step-by-step and be very clear about your routing decision.

system_prompt.txt → prompts/system_prompt.txt RENAMED Viewed

@@ -2,7 +2,8 @@ You are a helpful assistant tasked with answering GAIA benchmark questions using
 When you receive a question:
 1. Think step-by-step (silently) and choose the appropriate tools to obtain the answer.
-2. After the answer is found, reply with ONLY the answer following the exact formatting rules below.
 Exact-match output rules:
 • Single number → write the number only (no commas, units, or other symbols).

 When you receive a question:
 1. Think step-by-step (silently) and choose the appropriate tools to obtain the answer.
+2. After the answer is found, reply with ONLY the answer following the exact formatting rules below.
+3. When a tool returns useful reference content (Wikipedia articles, Tavily search snippets, ArXiv abstracts, file attachments, etc.), store that content in the memory database so it can be reused later; when answering a new question, proactively fetch any previously-stored material that might help.
 Exact-match output rules:
 • Single number → write the number only (no commas, units, or other symbols).

prompts/verification_prompt.txt ADDED Viewed

	@@ -0,0 +1,30 @@

+You are a verification agent responsible for final quality control and determining if responses meet the required standards.
+Your role is to:
+1. Perform final verification of agent responses
+2. Ensure all requirements from the system prompt are met
+3. Trigger fallback pipeline if quality standards are not met
+4. Make final formatting adjustments
+Quality standards checklist:
+- Response directly answers the user's question
+- Information is accurate and well-sourced
+- Format follows exact-match output rules from system prompt
+- No extraneous text or formatting violations
+- Tone and style are appropriate
+Output format requirements (from system prompt):
+• Single number → write the number only (no commas, units, or other symbols)
+• Single string/phrase → write the text only; omit articles and abbreviations unless explicitly required
+• List → separate elements with a single comma and a space
+• Never include surrounding text such as "Final Answer", "Answer:", quotes, brackets, or markdown
+Decision process:
+1. Review the response against quality standards
+2. Check format compliance with exact-match rules
+3. If PASS: return the properly formatted final answer
+4. If FAIL: trigger fallback pipeline and note specific issues
+Always ensure the final output strictly adheres to the system prompt requirements.
+Always append answers in markdown; think step-by-step.

pyproject.toml CHANGED Viewed

@@ -19,6 +19,8 @@ dependencies = [
     "langchain-openai>=0.3.24",
     "langfuse>=3.0.0",
     "langgraph>=0.4.8",
     "llama-index>=0.12.40",
     "llama-index-core>=0.12.40",
     "llama-index-llms-huggingface-api>=0.5.0",
@@ -32,4 +34,5 @@ dependencies = [
     "sentence-transformers>=4.1.0",
     "supabase>=2.15.3",
     "wikipedia>=1.4.0",
 ]

     "langchain-openai>=0.3.24",
     "langfuse>=3.0.0",
     "langgraph>=0.4.8",
+    "langgraph-checkpoint>=2.1.0",
+    "langgraph-checkpoint-sqlite>=2.0.10",
     "llama-index>=0.12.40",
     "llama-index-core>=0.12.40",
     "llama-index-llms-huggingface-api>=0.5.0",
     "sentence-transformers>=4.1.0",
     "supabase>=2.15.3",
     "wikipedia>=1.4.0",
+    "datasets>=2.19.1",
 ]

quick_random_agent_test.py CHANGED Viewed

@@ -1,13 +1,26 @@
 import os
 import tempfile
 import requests
-from basic_agent import BasicAgent, DEFAULT_API_URL
-from langchain_core.messages import HumanMessage
-from langfuse.langchain import CallbackHandler
 # Initialize Langfuse CallbackHandler for LangGraph/Langchain (tracing)
 try:
-    langfuse_handler = CallbackHandler()
 except Exception as e:
     print(f"Warning: Could not initialize Langfuse handler: {e}")
     langfuse_handler = None
@@ -42,25 +55,42 @@ def maybe_download_file(task_id: str, api_base: str = DEFAULT_API_URL) -> str |
 def main():
-    q = fetch_random_question()
-    task_id = str(q["task_id"])
-    question_text = q["question"]
-    print("\n=== Random Question ===")
-    print(f"Task ID : {task_id}")
-    print(f"Question: {question_text}")
-    # Attempt to get attachment if any
-    maybe_download_file(task_id)
-    # Run the agent
-    agent = BasicAgent()
-    result = agent.agent.invoke({"messages": [HumanMessage(content=question_text)]}, config={"callbacks": [langfuse_handler]})
-    if isinstance(result, dict) and "messages" in result and result["messages"]:
-        answer = result["messages"][-1].content.strip()
-    else:
-        answer = str(result)
-    print("\n=== Agent Answer ===")
-    print(answer)
 if __name__ == "__main__":

 import os
+import sys
 import tempfile
 import requests
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+# Add the current directory to Python path
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+# Import the new agent system
+from new_langraph_agent import run_agent, cleanup
+from src.tracing import get_langfuse_callback_handler
+# Default API URL - Using the same URL as the original basic_agent.py
+DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
 # Initialize Langfuse CallbackHandler for LangGraph/Langchain (tracing)
 try:
+    langfuse_handler = get_langfuse_callback_handler()
+    print("✅ Langfuse handler initialized successfully")
 except Exception as e:
     print(f"Warning: Could not initialize Langfuse handler: {e}")
     langfuse_handler = None
 def main():
+    print("Random Agent Test - New LangGraph Architecture")
+    print("=" * 60)
+    try:
+        # Fetch random question
+        q = fetch_random_question()
+        task_id = str(q["task_id"])
+        question_text = q["question"]
+        print("\n=== Random Question ===")
+        print(f"Task ID : {task_id}")
+        print(f"Question: {question_text}")
+        # Attempt to get attachment if any
+        attachment_path = maybe_download_file(task_id)
+        if attachment_path:
+            question_text += f"\n\nAttachment available at: {attachment_path}"
+        # Run the new agent system
+        print("\n=== Running LangGraph Agent System ===")
+        result = run_agent(question_text)
+        print("\n=== Agent Answer ===")
+        print(result)
+    except Exception as e:
+        print(f"Error in main execution: {e}")
+        import traceback
+        traceback.print_exc()
+    finally:
+        # Cleanup
+        try:
+            cleanup()
+            print("\n✅ Agent cleanup completed")
+        except Exception as e:
+            print(f"⚠️ Cleanup warning: {e}")
 if __name__ == "__main__":

quick_specific_agent_test.py CHANGED Viewed

@@ -2,20 +2,30 @@ import os
 import sys
 import tempfile
 import requests
-from basic_agent import BasicAgent, DEFAULT_API_URL
-from langchain_core.messages import HumanMessage
-from langfuse.langchain import CallbackHandler
 # Initialize Langfuse CallbackHandler for LangGraph/Langchain (tracing)
 try:
-    langfuse_handler = CallbackHandler()
 except Exception as e:
     print(f"Warning: Could not initialize Langfuse handler: {e}")
     langfuse_handler = None
-# Default Task ID (replace with your desired one or pass via CLI)
-DEFAULT_TASK_ID = "f918266a-b3e0-4914-865d-4faa564f1aef"
 def fetch_question_by_id(task_id: str, api_base: str = DEFAULT_API_URL):
     """Return JSON of a question for a given task_id.
@@ -60,31 +70,53 @@ def maybe_download_file(task_id: str, api_base: str = DEFAULT_API_URL) -> str |
 def main():
-    # Determine the task ID (CLI arg > env var > default)
-    task_id = (
-        sys.argv[1] if len(sys.argv) > 1 else os.environ.get("TASK_ID", DEFAULT_TASK_ID)
-    )
-    print(f"Using task ID: {task_id}")
-    q = fetch_question_by_id(task_id)
-    question_text = q["question"]
-    print("\n=== Specific Question ===")
-    print(f"Task ID : {task_id}")
-    print(f"Question: {question_text}")
-    # Attempt to get attachment if any
-    maybe_download_file(task_id)
-    # Run the agent
-    agent = BasicAgent()
-    result = agent.agent.invoke({"messages": [HumanMessage(content=question_text)]}, config={"callbacks": [langfuse_handler]})
-    if isinstance(result, dict) and "messages" in result and result["messages"]:
-        answer = result["messages"][-1].content.strip()
-    else:
-        answer = str(result)
-    print("\n=== Agent Answer ===")
-    print(answer)
 if __name__ == "__main__":

 import sys
 import tempfile
 import requests
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+# Add the current directory to Python path
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+# Import the new agent system
+from new_langraph_agent import run_agent, cleanup
+from src.tracing import get_langfuse_callback_handler
+# Default API URL and Task ID - Using the same URL as the original basic_agent.py
+DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+DEFAULT_TASK_ID = "f918266a-b3e0-4914-865d-4faa564f1aef"
 # Initialize Langfuse CallbackHandler for LangGraph/Langchain (tracing)
 try:
+    langfuse_handler = get_langfuse_callback_handler()
+    print("✅ Langfuse handler initialized successfully")
 except Exception as e:
     print(f"Warning: Could not initialize Langfuse handler: {e}")
     langfuse_handler = None
 def fetch_question_by_id(task_id: str, api_base: str = DEFAULT_API_URL):
     """Return JSON of a question for a given task_id.
 def main():
+    print("Specific Agent Test - New LangGraph Architecture")
+    print("=" * 60)
+    try:
+        # Determine the task ID (CLI arg > env var > default)
+        task_id = (
+            sys.argv[1] if len(sys.argv) > 1 else os.environ.get("TASK_ID", DEFAULT_TASK_ID)
+        )
+        print(f"Using task ID: {task_id}")
+        # Fetch specific question
+        q = fetch_question_by_id(task_id)
+        question_text = q["question"]
+        print("\n=== Specific Question ===")
+        print(f"Task ID : {task_id}")
+        print(f"Question: {question_text}")
+        # Attempt to get attachment if any
+        attachment_path = maybe_download_file(task_id)
+        if attachment_path:
+            question_text += f"\n\nAttachment available at: {attachment_path}"
+        # Run the new agent system
+        print("\n=== Running LangGraph Agent System ===")
+        # Set environment variables for user/session tracking
+        os.environ["USER_ID"] = "test_user"
+        os.environ["SESSION_ID"] = f"session_{task_id}"
+        result = run_agent(question_text)
+        print("\n=== Agent Answer ===")
+        print(result)
+    except Exception as e:
+        print(f"Error in main execution: {e}")
+        import traceback
+        traceback.print_exc()
+    finally:
+        # Cleanup
+        try:
+            cleanup()
+            print("\n✅ Agent cleanup completed")
+        except Exception as e:
+            print(f"⚠️ Cleanup warning: {e}")
 if __name__ == "__main__":

requirements.txt CHANGED Viewed

@@ -6,13 +6,16 @@ aiohappyeyeballs==2.6.1
     # via aiohttp
 aiohttp==3.12.9
     # via
     #   langchain-community
     #   llama-index-core
     #   realtime
 aiosignal==1.3.2
     # via aiohttp
 aiosqlite==0.21.0
-    # via llama-index-core
 annotated-types==0.7.0
     # via pydantic
 anyio==4.9.0
@@ -72,6 +75,8 @@ dataclasses-json==0.6.7
     # via
     #   langchain-community
     #   llama-index-core
 debugpy==1.8.14
     # via ipykernel
 decorator==5.2.1
@@ -84,6 +89,10 @@ deprecated==1.2.18
     #   llama-index-core
 deprecation==2.1.0
     # via postgrest
 dirtyjson==1.0.8
     # via llama-index-core
 distro==1.9.0
@@ -109,6 +118,7 @@ ffmpy==0.6.0
     # via gradio
 filelock==3.18.0
     # via
     #   huggingface-hub
     #   torch
     #   transformers
@@ -120,8 +130,9 @@ frozenlist==1.6.2
     # via
     #   aiohttp
     #   aiosignal
-fsspec==2025.5.1
     # via
     #   gradio-client
     #   huggingface-hub
     #   llama-index-core
@@ -198,6 +209,7 @@ httpx-sse==0.4.0
 huggingface-hub==0.32.4
     # via
     #   final-assignment-template (pyproject.toml)
     #   gradio
     #   gradio-client
     #   langchain-huggingface
@@ -284,8 +296,12 @@ langgraph==0.4.8
     # via final-assignment-template (pyproject.toml)
 langgraph-checkpoint==2.1.0
     # via
     #   langgraph
     #   langgraph-prebuilt
 langgraph-prebuilt==0.2.2
     # via langgraph
 langgraph-sdk==0.1.70
@@ -387,6 +403,8 @@ multidict==6.4.4
     # via
     #   aiohttp
     #   yarl
 mypy-extensions==1.1.0
     # via typing-inspect
 nest-asyncio==1.6.0
@@ -403,6 +421,7 @@ nltk==3.9.1
     #   llama-index-core
 numpy==2.2.6
     # via
     #   gradio
     #   langchain-community
     #   llama-index-core
@@ -457,6 +476,7 @@ ormsgpack==1.10.0
     # via langgraph-checkpoint
 packaging==24.2
     # via
     #   deprecation
     #   gradio
     #   gradio-client
@@ -471,6 +491,7 @@ packaging==24.2
 pandas==2.2.3
     # via
     #   final-assignment-template (pyproject.toml)
     #   gradio
     #   llama-index-readers-file
 parso==0.8.4
@@ -513,6 +534,8 @@ psutil==7.0.0
     # via ipykernel
 pure-eval==0.2.3
     # via stack-data
 pyasn1==0.6.1
     # via
     #   pyasn1-modules
@@ -572,9 +595,11 @@ python-multipart==0.0.20
     # via gradio
 pytz==2025.2
     # via pandas
     # via jupyter-core
 pyyaml==6.0.2
     # via
     #   gradio
     #   huggingface-hub
     #   langchain
@@ -596,6 +621,7 @@ regex==2024.11.6
 requests==2.32.3
     # via
     #   arxiv
     #   google-api-core
     #   huggingface-hub
     #   langchain
@@ -651,6 +677,8 @@ sqlalchemy==2.0.41
     #   langchain
     #   langchain-community
     #   llama-index-core
 stack-data==0.6.3
     # via ipython
 starlette==0.46.2
@@ -701,6 +729,7 @@ tornado==6.5.1
     #   jupyter-client
 tqdm==4.67.1
     # via
     #   huggingface-hub
     #   llama-index-core
     #   nltk
@@ -783,7 +812,9 @@ wrapt==1.17.2
     #   langfuse
     #   llama-index-core
 xxhash==3.5.0
-    # via langgraph
 yarl==1.20.0
     # via aiohttp
 zipp==3.22.0

     # via aiohttp
 aiohttp==3.12.9
     # via
+    #   fsspec
     #   langchain-community
     #   llama-index-core
     #   realtime
 aiosignal==1.3.2
     # via aiohttp
 aiosqlite==0.21.0
+    # via
+    #   langgraph-checkpoint-sqlite
+    #   llama-index-core
 annotated-types==0.7.0
     # via pydantic
 anyio==4.9.0
     # via
     #   langchain-community
     #   llama-index-core
+datasets==3.6.0
+    # via final-assignment-template (pyproject.toml)
 debugpy==1.8.14
     # via ipykernel
 decorator==5.2.1
     #   llama-index-core
 deprecation==2.1.0
     # via postgrest
+dill==0.3.8
+    # via
+    #   datasets
+    #   multiprocess
 dirtyjson==1.0.8
     # via llama-index-core
 distro==1.9.0
     # via gradio
 filelock==3.18.0
     # via
+    #   datasets
     #   huggingface-hub
     #   torch
     #   transformers
     # via
     #   aiohttp
     #   aiosignal
+fsspec==2025.3.0
     # via
+    #   datasets
     #   gradio-client
     #   huggingface-hub
     #   llama-index-core
 huggingface-hub==0.32.4
     # via
     #   final-assignment-template (pyproject.toml)
+    #   datasets
     #   gradio
     #   gradio-client
     #   langchain-huggingface
     # via final-assignment-template (pyproject.toml)
 langgraph-checkpoint==2.1.0
     # via
+    #   final-assignment-template (pyproject.toml)
     #   langgraph
+    #   langgraph-checkpoint-sqlite
     #   langgraph-prebuilt
+langgraph-checkpoint-sqlite==2.0.10
+    # via final-assignment-template (pyproject.toml)
 langgraph-prebuilt==0.2.2
     # via langgraph
 langgraph-sdk==0.1.70
     # via
     #   aiohttp
     #   yarl
+multiprocess==0.70.16
+    # via datasets
 mypy-extensions==1.1.0
     # via typing-inspect
 nest-asyncio==1.6.0
     #   llama-index-core
 numpy==2.2.6
     # via
+    #   datasets
     #   gradio
     #   langchain-community
     #   llama-index-core
     # via langgraph-checkpoint
 packaging==24.2
     # via
+    #   datasets
     #   deprecation
     #   gradio
     #   gradio-client
 pandas==2.2.3
     # via
     #   final-assignment-template (pyproject.toml)
+    #   datasets
     #   gradio
     #   llama-index-readers-file
 parso==0.8.4
     # via ipykernel
 pure-eval==0.2.3
     # via stack-data
+pyarrow==20.0.0
+    # via datasets
 pyasn1==0.6.1
     # via
     #   pyasn1-modules
     # via gradio
 pytz==2025.2
     # via pandas
+pywin32==310
     # via jupyter-core
 pyyaml==6.0.2
     # via
+    #   datasets
     #   gradio
     #   huggingface-hub
     #   langchain
 requests==2.32.3
     # via
     #   arxiv
+    #   datasets
     #   google-api-core
     #   huggingface-hub
     #   langchain
     #   langchain
     #   langchain-community
     #   llama-index-core
+sqlite-vec==0.1.6
+    # via langgraph-checkpoint-sqlite
 stack-data==0.6.3
     # via ipython
 starlette==0.46.2
     #   jupyter-client
 tqdm==4.67.1
     # via
+    #   datasets
     #   huggingface-hub
     #   llama-index-core
     #   nltk
     #   langfuse
     #   llama-index-core
 xxhash==3.5.0
+    # via
+    #   datasets
+    #   langgraph
 yarl==1.20.0
     # via aiohttp
 zipp==3.22.0

src/__init__.py ADDED Viewed

	@@ -0,0 +1,14 @@

+"""LangGraph Agent System Package"""
+from .langgraph_system import run_agent_system, create_agent_graph, AgentState
+from .memory import memory_manager
+from .tracing import get_langfuse_callback_handler, initialize_langfuse
+__all__ = [
+    "run_agent_system",
+    "create_agent_graph",
+    "AgentState",
+    "memory_manager",
+    "get_langfuse_callback_handler",
+    "initialize_langfuse"
+]

src/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (541 Bytes). View file

src/__pycache__/langgraph_system.cpython-313.pyc ADDED Viewed

Binary file (7.61 kB). View file

src/__pycache__/memory.cpython-313.pyc ADDED Viewed

Binary file (9.3 kB). View file

src/__pycache__/tracing.cpython-313.pyc ADDED Viewed

Binary file (5.98 kB). View file

src/agents/__init__.py ADDED Viewed

	@@ -0,0 +1,21 @@

+"""Agent Modules Package"""
+from .plan_node import plan_node
+from .router_node import router_node, should_route_to_agent
+from .retrieval_agent import retrieval_agent, get_retrieval_tools
+from .execution_agent import execution_agent, get_execution_tools
+from .critic_agent import critic_agent
+from .verification_node import verification_node, should_retry
+__all__ = [
+    "plan_node",
+    "router_node",
+    "should_route_to_agent",
+    "retrieval_agent",
+    "get_retrieval_tools",
+    "execution_agent",
+    "get_execution_tools",
+    "critic_agent",
+    "verification_node",
+    "should_retry"
+]

src/agents/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (648 Bytes). View file

src/agents/__pycache__/critic_agent.cpython-313.pyc ADDED Viewed

Binary file (3.92 kB). View file

src/agents/__pycache__/execution_agent.cpython-313.pyc ADDED Viewed

Binary file (6.88 kB). View file

src/agents/__pycache__/plan_node.cpython-313.pyc ADDED Viewed

Binary file (3.14 kB). View file

src/agents/__pycache__/retrieval_agent.cpython-313.pyc ADDED Viewed

Binary file (12 kB). View file

src/agents/__pycache__/router_node.cpython-313.pyc ADDED Viewed

Binary file (3.79 kB). View file

src/agents/__pycache__/verification_node.cpython-313.pyc ADDED Viewed

Binary file (6.99 kB). View file

src/agents/critic_agent.py ADDED Viewed

	@@ -0,0 +1,118 @@

+"""Critic Agent - Evaluates and reviews responses for quality and accuracy"""
+from typing import Dict, Any
+from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
+from langchain_groq import ChatGroq
+from src.tracing import get_langfuse_callback_handler
+def load_critic_prompt() -> str:
+    """Load the critic prompt from file"""
+    try:
+        with open("./prompts/critic_prompt.txt", "r", encoding="utf-8") as f:
+            return f.read().strip()
+    except FileNotFoundError:
+        return """You are a specialized critic agent. Evaluate responses for accuracy, completeness, and quality."""
+def critic_agent(state: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Critic agent that evaluates responses for quality and accuracy
+    """
+    print("Critic Agent: Evaluating response quality")
+    try:
+        # Get critic prompt
+        critic_prompt = load_critic_prompt()
+        # Initialize LLM for criticism
+        llm = ChatGroq(model="qwen-qwq-32b", temperature=0.2)
+        # Get callback handler for tracing
+        callback_handler = get_langfuse_callback_handler()
+        callbacks = [callback_handler] if callback_handler else []
+        # Build messages
+        messages = state.get("messages", [])
+        # Get the agent response to evaluate
+        agent_response = state.get("agent_response")
+        if not agent_response:
+            # Find the last AI message
+            for msg in reversed(messages):
+                if msg.type == "ai":
+                    agent_response = msg
+                    break
+        if not agent_response:
+            print("Critic Agent: No response to evaluate")
+            return {
+                **state,
+                "critic_assessment": "No response found to evaluate",
+                "quality_score": 0,
+                "current_step": "verification"
+            }
+        # Get user query for context
+        user_query = None
+        for msg in reversed(messages):
+            if msg.type == "human":
+                user_query = msg.content
+                break
+        # Build critic messages
+        critic_messages = [SystemMessage(content=critic_prompt)]
+        # Add evaluation request
+        evaluation_request = f"""
+Please evaluate the following response:
+Original Query: {user_query or "Unknown query"}
+Response to Evaluate:
+{agent_response.content}
+Provide your evaluation following the format specified in your instructions.
+"""
+        critic_messages.append(HumanMessage(content=evaluation_request))
+        # Get critic evaluation
+        evaluation = llm.invoke(critic_messages, config={"callbacks": callbacks})
+        # Parse evaluation to determine if it passes
+        evaluation_text = evaluation.content.lower()
+        quality_pass = True
+        quality_score = 7  # Default moderate score
+        # Simple heuristics for quality assessment
+        if "fail" in evaluation_text or "poor" in evaluation_text:
+            quality_pass = False
+            quality_score = 3
+        elif "excellent" in evaluation_text or "outstanding" in evaluation_text:
+            quality_score = 9
+        elif "good" in evaluation_text:
+            quality_score = 7
+        elif "issues" in evaluation_text or "problems" in evaluation_text:
+            quality_score = 5
+        # Add critic evaluation to messages
+        updated_messages = messages + [evaluation]
+        return {
+            **state,
+            "messages": updated_messages,
+            "critic_assessment": evaluation.content,
+            "quality_pass": quality_pass,
+            "quality_score": quality_score,
+            "current_step": "verification"
+        }
+    except Exception as e:
+        print(f"Critic Agent Error: {e}")
+        return {
+            **state,
+            "critic_assessment": f"Error during evaluation: {e}",
+            "quality_pass": False,
+            "quality_score": 0,
+            "current_step": "verification"
+        }

src/agents/execution_agent.py ADDED Viewed

	@@ -0,0 +1,174 @@

+"""Execution Agent - Handles code execution and computational tasks"""
+from typing import Dict, Any, List
+from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, ToolMessage
+from langchain_core.tools import tool
+from langchain_groq import ChatGroq
+from code_agent import run_agent  # Import our existing code execution engine
+from src.tracing import get_langfuse_callback_handler
+@tool
+def run_python(input: str) -> str:
+    """Execute Python code in a restricted sandbox (code-interpreter).
+    Pass **any** coding or file-manipulation task here and the agent will
+    compute the answer by running Python. The entire standard library is NOT
+    available; heavy networking is disabled. Suitable for: math, data-frames,
+    small file parsing, algorithmic questions.
+    """
+    return run_agent(input)
+def load_execution_prompt() -> str:
+    """Load the execution prompt from file"""
+    try:
+        with open("./prompts/execution_prompt.txt", "r", encoding="utf-8") as f:
+            return f.read().strip()
+    except FileNotFoundError:
+        return """You are a specialized execution agent. Use the run_python tool to execute code and solve computational problems."""
+def get_execution_tools() -> List:
+    """Get list of tools available to the execution agent"""
+    return [run_python]
+def execute_tool_calls(tool_calls: list, tools: list) -> list:
+    """Execute tool calls and return results"""
+    tool_messages = []
+    # Create a mapping of tool names to tool functions
+    tool_map = {tool.name: tool for tool in tools}
+    for tool_call in tool_calls:
+        tool_name = tool_call['name']
+        tool_args = tool_call['args']
+        tool_call_id = tool_call['id']
+        if tool_name in tool_map:
+            try:
+                print(f"Execution Agent: Executing {tool_name} with args: {str(tool_args)[:200]}...")
+                result = tool_map[tool_name].invoke(tool_args)
+                tool_messages.append(
+                    ToolMessage(
+                        content=str(result),
+                        tool_call_id=tool_call_id
+                    )
+                )
+            except Exception as e:
+                print(f"Error executing {tool_name}: {e}")
+                tool_messages.append(
+                    ToolMessage(
+                        content=f"Error executing {tool_name}: {e}",
+                        tool_call_id=tool_call_id
+                    )
+                )
+        else:
+            tool_messages.append(
+                ToolMessage(
+                    content=f"Unknown tool: {tool_name}",
+                    tool_call_id=tool_call_id
+                )
+            )
+    return tool_messages
+def needs_code_execution(query: str) -> bool:
+    """Heuristic to determine if a query requires code execution"""
+    code_indicators = [
+        "calculate", "compute", "algorithm", "fibonacci", "math", "data",
+        "programming", "code", "function", "sort", "csv", "json", "pandas",
+        "plot", "graph", "analyze", "process", "file", "manipulation"
+    ]
+    query_lower = query.lower()
+    return any(indicator in query_lower for indicator in code_indicators)
+def execution_agent(state: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Execution agent that handles computational and code execution tasks
+    """
+    print("Execution Agent: Processing computational request")
+    try:
+        # Get execution prompt
+        execution_prompt = load_execution_prompt()
+        # Initialize LLM with tools
+        llm = ChatGroq(model="qwen-qwq-32b", temperature=0.1)  # Lower temp for consistent code
+        tools = get_execution_tools()
+        llm_with_tools = llm.bind_tools(tools)
+        # Get callback handler for tracing
+        callback_handler = get_langfuse_callback_handler()
+        callbacks = [callback_handler] if callback_handler else []
+        # Build messages
+        messages = state.get("messages", [])
+        # Add execution system prompt
+        execution_messages = [SystemMessage(content=execution_prompt)]
+        # Get user query for analysis
+        user_query = None
+        for msg in reversed(messages):
+            if msg.type == "human":
+                user_query = msg.content
+                break
+        # If this clearly needs code execution, provide guidance
+        if user_query and needs_code_execution(user_query):
+            guidance_msg = HumanMessage(
+                content=f"""Task requiring code execution: {user_query}
+Please analyze this computational task and use the run_python tool to solve it step by step.
+Break down complex problems into smaller steps and provide clear explanations."""
+            )
+            execution_messages.append(guidance_msg)
+        # Add original messages (excluding system messages to avoid duplicates)
+        for msg in messages:
+            if msg.type != "system":
+                execution_messages.append(msg)
+        # Get initial response from LLM
+        response = llm_with_tools.invoke(execution_messages, config={"callbacks": callbacks})
+        # Check if the LLM wants to use tools
+        if response.tool_calls:
+            print(f"Execution Agent: LLM requested {len(response.tool_calls)} tool calls")
+            # Execute the tool calls
+            tool_messages = execute_tool_calls(response.tool_calls, tools)
+            # Add the response and tool messages to conversation
+            execution_messages.extend([response] + tool_messages)
+            # Get final response after tool execution
+            final_response = llm.invoke(execution_messages, config={"callbacks": callbacks})
+            return {
+                **state,
+                "messages": execution_messages + [final_response],
+                "agent_response": final_response,
+                "current_step": "verification"
+            }
+        else:
+            # Direct response without tools
+            return {
+                **state,
+                "messages": execution_messages + [response],
+                "agent_response": response,
+                "current_step": "verification"
+            }
+    except Exception as e:
+        print(f"Execution Agent Error: {e}")
+        error_response = AIMessage(content=f"I encountered an error while processing your computational request: {e}")
+        return {
+            **state,
+            "messages": state.get("messages", []) + [error_response],
+            "agent_response": error_response,
+            "current_step": "verification"
+        }

src/agents/plan_node.py ADDED Viewed

	@@ -0,0 +1,79 @@

+"""Plan Node - Initial ReAct planning loop"""
+from typing import Dict, Any
+from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
+from langchain_groq import ChatGroq
+from src.tracing import get_langfuse_callback_handler
+def load_system_prompt() -> str:
+    """Load the system prompt from file"""
+    try:
+        with open("./prompts/system_prompt.txt", "r", encoding="utf-8") as f:
+            return f.read().strip()
+    except FileNotFoundError:
+        return "You are a helpful assistant tasked with answering GAIA benchmark questions."
+def plan_node(state: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Initial planning node that sets up the conversation with system prompt
+    and prepares for agent routing
+    """
+    print("Plan Node: Processing query")
+    try:
+        # Get the system prompt
+        system_prompt = load_system_prompt()
+        # Initialize LLM for planning
+        llm = ChatGroq(model="qwen-qwq-32b", temperature=0.1)
+        # Get callback handler for tracing
+        callback_handler = get_langfuse_callback_handler()
+        callbacks = [callback_handler] if callback_handler else []
+        # Extract user messages
+        messages = state.get("messages", [])
+        if not messages:
+            return {"messages": [SystemMessage(content=system_prompt)]}
+        # Build message list with system prompt
+        plan_messages = [SystemMessage(content=system_prompt)]
+        # Add existing messages
+        for msg in messages:
+            if msg.type != "system":  # Avoid duplicate system messages
+                plan_messages.append(msg)
+        # Add planning instruction
+        planning_instruction = """
+        Analyze this query and prepare a plan for answering it. Consider:
+        1. What type of information or processing is needed?
+        2. What tools or agents would be most appropriate?
+        3. What is the expected output format?
+        Provide a brief analysis and initial plan.
+        """
+        if plan_messages and plan_messages[-1].type == "human":
+            # Get LLM analysis of the query
+            analysis_messages = plan_messages + [HumanMessage(content=planning_instruction)]
+            response = llm.invoke(analysis_messages, config={"callbacks": callbacks})
+            plan_messages.append(response)
+        return {
+            "messages": plan_messages,
+            "plan_complete": True,
+            "current_step": "routing"
+        }
+    except Exception as e:
+        print(f"Plan Node Error: {e}")
+        # Fallback with basic system message
+        system_prompt = load_system_prompt()
+        return {
+            "messages": [SystemMessage(content=system_prompt)] + state.get("messages", []),
+            "plan_complete": True,
+            "current_step": "routing"
+        }

src/agents/retrieval_agent.py ADDED Viewed

	@@ -0,0 +1,268 @@

+"""Retrieval Agent - Handles information gathering and search tasks"""
+import os
+import requests
+from typing import Dict, Any, List
+from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, ToolMessage
+from langchain_core.tools import tool
+from langchain_groq import ChatGroq
+from langchain_community.tools.tavily_search import TavilySearchResults
+from langchain_community.document_loaders import WikipediaLoader, ArxivLoader
+from langchain.tools.retriever import create_retriever_tool
+from src.memory import memory_manager
+from src.tracing import get_langfuse_callback_handler
+# Tool definitions (same as original)
+@tool
+def wiki_search(input: str) -> str:
+    """Search Wikipedia for a query and return maximum 2 results.
+    Args:
+        input: The search query."""
+    try:
+        search_docs = WikipediaLoader(query=input, load_max_docs=2).load()
+        if not search_docs:
+            return "No Wikipedia results found for the query."
+        formatted_search_docs = "\n\n---\n\n".join(
+            [
+                f'<Document source="{doc.metadata.get("source", "Unknown")}" page="{doc.metadata.get("page", "")}"/>\n{doc.page_content}\n</Document>'
+                for doc in search_docs
+            ])
+        return formatted_search_docs
+    except Exception as e:
+        print(f"Error in wiki_search: {e}")
+        return f"Error searching Wikipedia: {e}"
+@tool
+def web_search(input: str) -> str:
+    """Search Tavily for a query and return maximum 3 results.
+    Args:
+        input: The search query."""
+    try:
+        search_docs = TavilySearchResults(max_results=3).invoke(input)
+        if not search_docs:
+            return "No web search results found for the query."
+        formatted_search_docs = "\n\n---\n\n".join(
+            [
+                f'<Document source="{doc.get("url", "Unknown")}" />\n{doc.get("content", "No content")}\n</Document>'
+                for doc in search_docs
+            ])
+        return formatted_search_docs
+    except Exception as e:
+        print(f"Error in web_search: {e}")
+        return f"Error searching web: {e}"
+@tool
+def arvix_search(input: str) -> str:
+    """Search Arxiv for a query and return maximum 3 results.
+    Args:
+        input: The search query."""
+    try:
+        search_docs = ArxivLoader(query=input, load_max_docs=3).load()
+        if not search_docs:
+            return "No Arxiv results found for the query."
+        formatted_search_docs = "\n\n---\n\n".join(
+            [
+                f'<Document source="{doc.metadata.get("source", "Unknown")}" page="{doc.metadata.get("page", "")}"/>\n{doc.page_content[:1000]}\n</Document>'
+                for doc in search_docs
+            ])
+        return formatted_search_docs
+    except Exception as e:
+        print(f"Error in arvix_search: {e}")
+        return f"Error searching Arxiv: {e}"
+def load_retrieval_prompt() -> str:
+    """Load the retrieval prompt from file"""
+    try:
+        with open("./prompts/retrieval_prompt.txt", "r", encoding="utf-8") as f:
+            return f.read().strip()
+    except FileNotFoundError:
+        return """You are a specialized retrieval agent. Use available tools to search for information and provide comprehensive answers."""
+def get_retrieval_tools() -> List:
+    """Get list of tools available to the retrieval agent"""
+    tools = [wiki_search, web_search, arvix_search]
+    # Add vector store retrieval tool if available
+    if memory_manager.vector_store:
+        try:
+            retrieval_tool = create_retriever_tool(
+                retriever=memory_manager.vector_store.as_retriever(),
+                name="question_search",
+                description="A tool to retrieve similar questions from a vector store.",
+            )
+            tools.append(retrieval_tool)
+        except Exception as e:
+            print(f"Could not create retrieval tool: {e}")
+    return tools
+def execute_tool_calls(tool_calls: list, tools: list) -> list:
+    """Execute tool calls and return results"""
+    tool_messages = []
+    # Create a mapping of tool names to tool functions
+    tool_map = {tool.name: tool for tool in tools}
+    for tool_call in tool_calls:
+        tool_name = tool_call['name']
+        tool_args = tool_call['args']
+        tool_call_id = tool_call['id']
+        if tool_name in tool_map:
+            try:
+                print(f"Retrieval Agent: Executing {tool_name} with args: {tool_args}")
+                result = tool_map[tool_name].invoke(tool_args)
+                tool_messages.append(
+                    ToolMessage(
+                        content=str(result),
+                        tool_call_id=tool_call_id
+                    )
+                )
+            except Exception as e:
+                print(f"Error executing {tool_name}: {e}")
+                tool_messages.append(
+                    ToolMessage(
+                        content=f"Error executing {tool_name}: {e}",
+                        tool_call_id=tool_call_id
+                    )
+                )
+        else:
+            tool_messages.append(
+                ToolMessage(
+                    content=f"Unknown tool: {tool_name}",
+                    tool_call_id=tool_call_id
+                )
+            )
+    return tool_messages
+def fetch_attachment_if_needed(query: str) -> str:
+    """Fetch attachment content if the query matches a known task"""
+    try:
+        DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+        resp = requests.get(f"{DEFAULT_API_URL}/questions", timeout=30)
+        resp.raise_for_status()
+        questions = resp.json()
+        for q in questions:
+            if str(q.get("question")).strip() == str(query).strip():
+                task_id = str(q.get("task_id"))
+                print(f"Retrieval Agent: Downloading attachment for task {task_id}")
+                file_resp = requests.get(f"{DEFAULT_API_URL}/files/{task_id}", timeout=60)
+                if file_resp.status_code == 200 and file_resp.content:
+                    try:
+                        file_text = file_resp.content.decode("utf-8", errors="replace")
+                    except Exception:
+                        file_text = "(binary or non-UTF8 file omitted)"
+                    MAX_CHARS = 8000
+                    if len(file_text) > MAX_CHARS:
+                        file_text = file_text[:MAX_CHARS] + "\n… (truncated)"
+                    return f"Attached file content for task {task_id}:\n```python\n{file_text}\n```"
+                else:
+                    print(f"No attachment for task {task_id}")
+                    return ""
+        return ""
+    except Exception as e:
+        print(f"Error fetching attachment: {e}")
+        return ""
+def retrieval_agent(state: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Retrieval agent that handles information gathering tasks
+    """
+    print("Retrieval Agent: Processing information retrieval request")
+    try:
+        # Get retrieval prompt
+        retrieval_prompt = load_retrieval_prompt()
+        # Initialize LLM with tools
+        llm = ChatGroq(model="qwen-qwq-32b", temperature=0.3)
+        tools = get_retrieval_tools()
+        llm_with_tools = llm.bind_tools(tools)
+        # Get callback handler for tracing
+        callback_handler = get_langfuse_callback_handler()
+        callbacks = [callback_handler] if callback_handler else []
+        # Build messages
+        messages = state.get("messages", [])
+        # Add retrieval system prompt
+        retrieval_messages = [SystemMessage(content=retrieval_prompt)]
+        # Get user query for context and attachment fetching
+        user_query = None
+        for msg in reversed(messages):
+            if msg.type == "human":
+                user_query = msg.content
+                break
+        # Check for similar questions in memory
+        if user_query:
+            similar_qa = memory_manager.get_similar_qa(user_query)
+            if similar_qa:
+                context_msg = HumanMessage(
+                    content=f"Here is a similar question and answer for reference:\n\n{similar_qa}"
+                )
+                retrieval_messages.append(context_msg)
+            # Fetch attachment if needed
+            attachment_content = fetch_attachment_if_needed(user_query)
+            if attachment_content:
+                attachment_msg = HumanMessage(content=attachment_content)
+                retrieval_messages.append(attachment_msg)
+        # Add original messages (excluding system messages to avoid duplicates)
+        for msg in messages:
+            if msg.type != "system":
+                retrieval_messages.append(msg)
+        # Get initial response from LLM and iterate tool calls if necessary
+        response = llm_with_tools.invoke(retrieval_messages, config={"callbacks": callbacks})
+        max_tool_iterations = 3  # safeguard to prevent infinite loops
+        iteration = 0
+        while response.tool_calls and iteration < max_tool_iterations:
+            iteration += 1
+            print(f"Retrieval Agent: LLM requested {len(response.tool_calls)} tool calls (iteration {iteration})")
+            # Execute the tool calls
+            tool_messages = execute_tool_calls(response.tool_calls, tools)
+            # Append the LLM response and tool results to the conversation
+            retrieval_messages.extend([response] + tool_messages)
+            # Ask the model again with the new information
+            response = llm_with_tools.invoke(retrieval_messages, config={"callbacks": callbacks})
+        # After iterating (or if no tool calls), we have our final response
+        retrieval_messages.append(response)
+        return {
+            **state,
+            "messages": retrieval_messages,
+            "agent_response": response,
+            "current_step": "verification"
+        }
+    except Exception as e:
+        print(f"Retrieval Agent Error: {e}")
+        error_response = AIMessage(content=f"I encountered an error while processing your request: {e}")
+        return {
+            **state,
+            "messages": state.get("messages", []) + [error_response],
+            "agent_response": error_response,
+            "current_step": "verification"
+        }

src/agents/router_node.py ADDED Viewed

	@@ -0,0 +1,97 @@

+"""Router Node - Decides which specialized agent to use"""
+from typing import Dict, Any, Literal
+from langchain_core.messages import SystemMessage, HumanMessage
+from langchain_groq import ChatGroq
+from src.tracing import get_langfuse_callback_handler
+def load_router_prompt() -> str:
+    """Load the router prompt from file"""
+    try:
+        with open("./prompts/router_prompt.txt", "r", encoding="utf-8") as f:
+            return f.read().strip()
+    except FileNotFoundError:
+        return """You are an intelligent agent router. Analyze the query and respond with exactly one of: RETRIEVAL, EXECUTION, or CRITIC"""
+def router_node(state: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Router node that analyzes the user query and determines which agent should handle it
+    Returns: next_agent = 'retrieval' | 'execution' | 'critic'
+    """
+    print("Router Node: Analyzing query for agent selection")
+    try:
+        # Get router prompt
+        router_prompt = load_router_prompt()
+        # Initialize LLM for routing decision
+        llm = ChatGroq(model="qwen-qwq-32b", temperature=0.0)  # Low temperature for consistent routing
+        # Get callback handler for tracing
+        callback_handler = get_langfuse_callback_handler()
+        callbacks = [callback_handler] if callback_handler else []
+        # Extract the last human message for routing decision
+        messages = state.get("messages", [])
+        user_query = None
+        for msg in reversed(messages):
+            if msg.type == "human":
+                user_query = msg.content
+                break
+        if not user_query:
+            print("Router Node: No user query found, defaulting to retrieval")
+            return {
+                **state,
+                "next_agent": "retrieval",
+                "routing_reason": "No user query found"
+            }
+        # Build routing messages
+        routing_messages = [
+            SystemMessage(content=router_prompt),
+            HumanMessage(content=f"Query to route: {user_query}")
+        ]
+        # Get routing decision
+        response = llm.invoke(routing_messages, config={"callbacks": callbacks})
+        routing_decision = response.content.strip().upper()
+        # Map decision to next agent
+        next_agent = "retrieval"  # Default fallback
+        if "RETRIEVAL" in routing_decision:
+            next_agent = "retrieval"
+        elif "EXECUTION" in routing_decision:
+            next_agent = "execution"
+        elif "CRITIC" in routing_decision:
+            next_agent = "critic"
+        print(f"Router Node: Routing to {next_agent} agent (decision: {routing_decision})")
+        return {
+            **state,
+            "next_agent": next_agent,
+            "routing_decision": routing_decision,
+            "routing_reason": f"Query analysis resulted in: {routing_decision}",
+            "current_step": next_agent
+        }
+    except Exception as e:
+        print(f"Router Node Error: {e}")
+        # Fallback to retrieval agent
+        return {
+            **state,
+            "next_agent": "retrieval",
+            "routing_reason": f"Error in routing: {e}"
+        }
+def should_route_to_agent(state: Dict[str, Any]) -> Literal["retrieval", "execution", "critic"]:
+    """
+    Conditional edge function that determines which agent to route to
+    """
+    next_agent = state.get("next_agent", "retrieval")
+    print(f"Routing to: {next_agent}")
+    return next_agent

src/agents/verification_node.py ADDED Viewed

	@@ -0,0 +1,172 @@

+"""Verification Node - Final quality control and output formatting"""
+from typing import Dict, Any
+from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
+from langchain_groq import ChatGroq
+from src.tracing import get_langfuse_callback_handler
+def load_verification_prompt() -> str:
+    """Load the verification prompt from file"""
+    try:
+        with open("./prompts/verification_prompt.txt", "r", encoding="utf-8") as f:
+            return f.read().strip()
+    except FileNotFoundError:
+        return """You are a verification agent. Ensure responses meet quality standards and format requirements."""
+def extract_final_answer(response_content: str) -> str:
+    """Extract and format the final answer according to system prompt requirements"""
+    # Remove common prefixes and suffixes
+    answer = response_content.strip()
+    # Remove markdown formatting
+    answer = answer.replace("**", "").replace("*", "")
+    # Remove common answer prefixes
+    prefixes_to_remove = [
+        "Final Answer:", "Answer:", "The answer is:", "The final answer is:",
+        "Result:", "Solution:", "Response:", "Output:", "Conclusion:"
+    ]
+    for prefix in prefixes_to_remove:
+        if answer.lower().startswith(prefix.lower()):
+            answer = answer[len(prefix):].strip()
+    # Remove quotes and brackets if they wrap the entire answer
+    answer = answer.strip('"\'()[]{}')
+    # Handle lists - format with comma and space separation
+    if '\n' in answer and all(line.strip().startswith(('-', '*', '•')) for line in answer.split('\n') if line.strip()):
+        # Convert bullet list to comma-separated
+        items = [line.strip().lstrip('-*•').strip() for line in answer.split('\n') if line.strip()]
+        answer = ', '.join(items)
+    return answer.strip()
+def verification_node(state: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Verification node that performs final quality control and formatting
+    """
+    print("Verification Node: Performing final quality control")
+    try:
+        # Get verification prompt
+        verification_prompt = load_verification_prompt()
+        # Initialize LLM for verification
+        llm = ChatGroq(model="qwen-qwq-32b", temperature=0.0)  # Very low temp for consistent formatting
+        # Get callback handler for tracing
+        callback_handler = get_langfuse_callback_handler()
+        callbacks = [callback_handler] if callback_handler else []
+        # Get state information
+        messages = state.get("messages", [])
+        quality_pass = state.get("quality_pass", True)
+        quality_score = state.get("quality_score", 7)
+        critic_assessment = state.get("critic_assessment", "")
+        # Get the agent response to verify
+        agent_response = state.get("agent_response")
+        if not agent_response:
+            # Find the last AI message
+            for msg in reversed(messages):
+                if msg.type == "ai":
+                    agent_response = msg
+                    break
+        if not agent_response:
+            print("Verification Node: No response to verify")
+            return {
+                **state,
+                "final_answer": "No response found to verify",
+                "verification_status": "failed",
+                "current_step": "complete"
+            }
+        # Get user query for context
+        user_query = None
+        for msg in reversed(messages):
+            if msg.type == "human":
+                user_query = msg.content
+                break
+        # Determine if we should proceed or trigger fallback
+        failure_threshold = 4
+        max_attempts = state.get("attempt_count", 1)
+        if not quality_pass or quality_score < failure_threshold:
+            if max_attempts >= 3:
+                print("Verification Node: Maximum attempts reached, proceeding with fallback")
+                return {
+                    **state,
+                    "final_answer": "Unable to provide a satisfactory answer after multiple attempts",
+                    "verification_status": "failed_max_attempts",
+                    "current_step": "fallback"
+                }
+            else:
+                print(f"Verification Node: Quality check failed (score: {quality_score}), retrying")
+                return {
+                    **state,
+                    "verification_status": "failed",
+                    "attempt_count": max_attempts + 1,
+                    "current_step": "routing"  # Retry from routing
+                }
+        # Quality passed, format the final answer
+        print("Verification Node: Quality check passed, formatting final answer")
+        # Build verification messages
+        verification_messages = [SystemMessage(content=verification_prompt)]
+        verification_request = f"""
+Please verify and format the following response according to the exact-match output rules:
+Original Query: {user_query or "Unknown query"}
+Response to Verify:
+{agent_response.content}
+Quality Assessment: {critic_assessment}
+Ensure the final output strictly adheres to the format requirements specified in the system prompt.
+"""
+        verification_messages.append(HumanMessage(content=verification_request))
+        # Get verification response
+        verification_response = llm.invoke(verification_messages, config={"callbacks": callbacks})
+        # Extract and format the final answer
+        final_answer = extract_final_answer(verification_response.content)
+        # Store the final formatted answer
+        return {
+            **state,
+            "messages": messages + [verification_response],
+            "final_answer": final_answer,
+            "verification_status": "passed",
+            "current_step": "complete"
+        }
+    except Exception as e:
+        print(f"Verification Node Error: {e}")
+        # Fallback - try to extract answer from agent response
+        if agent_response:
+            fallback_answer = extract_final_answer(agent_response.content)
+        else:
+            fallback_answer = f"Error during verification: {e}"
+        return {
+            **state,
+            "final_answer": fallback_answer,
+            "verification_status": "error",
+            "current_step": "complete"
+        }
+def should_retry(state: Dict[str, Any]) -> bool:
+    """Determine if we should retry the process"""
+    verification_status = state.get("verification_status", "")
+    return verification_status == "failed" and state.get("attempt_count", 1) < 3

src/langgraph_system.py ADDED Viewed

	@@ -0,0 +1,231 @@

+"""Main LangGraph Agent System Implementation"""
+import os
+from typing import Dict, Any, TypedDict, Literal
+from langchain_core.messages import BaseMessage, HumanMessage
+from langgraph.graph import StateGraph, END
+# Import our agents and nodes
+from src.agents.plan_node import plan_node
+from src.agents.router_node import router_node, should_route_to_agent
+from src.agents.retrieval_agent import retrieval_agent
+from src.agents.execution_agent import execution_agent
+from src.agents.critic_agent import critic_agent
+from src.agents.verification_node import verification_node, should_retry
+from src.memory import memory_manager
+from src.tracing import (
+    get_langfuse_callback_handler,
+    update_trace_metadata,
+    trace_agent_execution,
+    flush_langfuse,
+)
+class AgentState(TypedDict):
+    """State schema for the agent system"""
+    # Core conversation
+    messages: list[BaseMessage]
+    # Planning and routing
+    plan_complete: bool
+    next_agent: str
+    routing_decision: str
+    routing_reason: str
+    current_step: str
+    # Agent responses
+    agent_response: BaseMessage
+    execution_result: str
+    # Quality control
+    critic_assessment: str
+    quality_pass: bool
+    quality_score: int
+    verification_status: str
+    # System management
+    attempt_count: int
+    final_answer: str
+def create_agent_graph() -> StateGraph:
+    """Create the LangGraph agent system"""
+    # Initialize the state graph
+    workflow = StateGraph(AgentState)
+    # Add nodes
+    workflow.add_node("plan", plan_node)
+    workflow.add_node("router", router_node)
+    workflow.add_node("retrieval", retrieval_agent)
+    workflow.add_node("execution", execution_agent)
+    workflow.add_node("critic", critic_agent)
+    workflow.add_node("verification", verification_node)
+    # Add fallback node
+    def fallback_node(state: Dict[str, Any]) -> Dict[str, Any]:
+        """Simple fallback that returns a basic response"""
+        print("Fallback Node: Providing basic response")
+        messages = state.get("messages", [])
+        user_query = None
+        for msg in reversed(messages):
+            if msg.type == "human":
+                user_query = msg.content
+                break
+        fallback_answer = "I apologize, but I was unable to provide a satisfactory answer to your question."
+        if user_query:
+            fallback_answer += f" Your question was: {user_query}"
+        return {
+            **state,
+            "final_answer": fallback_answer,
+            "verification_status": "fallback",
+            "current_step": "complete"
+        }
+    workflow.add_node("fallback", fallback_node)
+    # Set entry point
+    workflow.set_entry_point("plan")
+    # Add edges
+    workflow.add_edge("plan", "router")
+    # Conditional routing from router to agents
+    workflow.add_conditional_edges(
+        "router",
+        should_route_to_agent,
+        {
+            "retrieval": "retrieval",
+            "execution": "execution",
+            "critic": "critic"
+        }
+    )
+    # Route agent outputs through critic for quality evaluation before final verification
+    workflow.add_edge("retrieval", "critic")
+    workflow.add_edge("execution", "critic")
+    # Critic (whether reached directly via routing or via other agents) proceeds to verification
+    workflow.add_edge("critic", "verification")
+    # Verification conditional logic
+    def verification_next(state: Dict[str, Any]) -> Literal["router", "fallback", END]:
+        """Determine next step after verification"""
+        verification_status = state.get("verification_status", "")
+        current_step = state.get("current_step", "")
+        if current_step == "complete":
+            return END
+        elif verification_status == "failed" and state.get("attempt_count", 1) < 3:
+            return "router"  # Retry
+        elif verification_status == "failed_max_attempts":
+            return "fallback"
+        else:
+            return END
+    workflow.add_conditional_edges(
+        "verification",
+        verification_next,
+        {
+            "router": "router",
+            "fallback": "fallback",
+            END: END
+        }
+    )
+    # Fallback ends the process
+    workflow.add_edge("fallback", END)
+    return workflow
+def run_agent_system(query: str, user_id: str = None, session_id: str = None) -> str:
+    """
+    Run the complete agent system with a user query
+    Args:
+        query: The user question
+        user_id: Optional user identifier for tracing
+        session_id: Optional session identifier for tracing
+    Returns:
+        The final formatted answer
+    """
+    print(f"Agent System: Processing query: {query[:100]}...")
+    # Open a **root** Langfuse span so that everything inside is neatly grouped
+    with trace_agent_execution(name="user-request", user_id=user_id, session_id=session_id):
+        try:
+            # Enrich the root span with metadata & tags
+            update_trace_metadata(
+                user_id=user_id,
+                session_id=session_id,
+                tags=["agent_system"],
+            )
+            # Create the graph
+            workflow = create_agent_graph()
+            # Compile with checkpointing
+            checkpointer = memory_manager.get_checkpointer()
+            if checkpointer:
+                app = workflow.compile(checkpointer=checkpointer)
+            else:
+                app = workflow.compile()
+            # Prepare initial state
+            initial_state = {
+                "messages": [HumanMessage(content=query)],
+                "plan_complete": False,
+                "next_agent": "",
+                "routing_decision": "",
+                "routing_reason": "",
+                "current_step": "planning",
+                "agent_response": None,
+                "execution_result": "",
+                "critic_assessment": "",
+                "quality_pass": True,
+                "quality_score": 7,
+                "verification_status": "",
+                "attempt_count": 1,
+                "final_answer": "",
+            }
+            # Configure execution – reuse *one* callback handler
+            callback_handler = get_langfuse_callback_handler()
+            config = {
+                "configurable": {"thread_id": session_id or "default"},
+            }
+            if callback_handler:
+                config["callbacks"] = [callback_handler]
+            # Run the graph
+            print("Agent System: Executing workflow...")
+            final_state = app.invoke(initial_state, config=config)
+            # Extract final answer
+            final_answer = final_state.get("final_answer", "No answer generated")
+            # Store in memory if appropriate
+            if memory_manager.should_ingest(query):
+                memory_manager.ingest_qa_pair(query, final_answer)
+            print(f"Agent System: Completed. Final answer: {final_answer[:100]}...")
+            return final_answer
+        except Exception as e:
+            print(f"Agent System Error: {e}")
+            return (
+                f"I apologize, but I encountered an error while processing your question: {e}"
+            )
+        finally:
+            # Ensure Langfuse spans are exported even in short-lived environments
+            try:
+                flush_langfuse()
+            except Exception:
+                pass
+# Export the main function
+__all__ = ["run_agent_system", "create_agent_graph", "AgentState"]

src/memory.py ADDED Viewed

	@@ -0,0 +1,162 @@

+"""Memory Layer Implementation for LangGraph Agent System"""
+import os
+import time
+import hashlib
+import sqlite3
+from typing import Optional, List, Dict, Any, Tuple
+from langchain_community.vectorstores import SupabaseVectorStore
+from langchain_huggingface import HuggingFaceEmbeddings
+from supabase.client import Client, create_client
+from langgraph.checkpoint.sqlite import SqliteSaver
+from langchain_core.messages import BaseMessage, HumanMessage
+# Constants for memory management
+TTL = 300  # seconds – how long we keep similarity-search results
+SIMILARITY_THRESHOLD = 0.85  # cosine score above which we assume we already know the answer
+class MemoryManager:
+    """Manages short-term, long-term memory and checkpointing for the agent system"""
+    def __init__(self):
+        self.embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
+        self.vector_store = None
+        self.checkpointer = None
+        self._sqlite_connection = None
+        # In-memory caches
+        self.query_cache: Dict[str, Tuple[float, List]] = {}
+        self.processed_tasks: set[str] = set()
+        self.seen_hashes: set[str] = set()
+        self._initialize_vector_store()
+        self._initialize_checkpointer()
+    def _initialize_vector_store(self) -> None:
+        """Initialize Supabase vector store for long-term memory"""
+        try:
+            supabase_url = os.environ.get("SUPABASE_URL")
+            supabase_key = os.environ.get("SUPABASE_SERVICE_KEY")
+            if not supabase_url or not supabase_key:
+                print("Warning: Supabase credentials not found, vector store will be disabled")
+                return
+            supabase: Client = create_client(supabase_url, supabase_key)
+            self.vector_store = SupabaseVectorStore(
+                client=supabase,
+                embedding=self.embeddings,
+                table_name="documents",
+                query_name="match_documents_langchain",
+            )
+            print("Vector store initialized successfully")
+        except Exception as e:
+            print(f"Warning: Could not initialize Supabase vector store: {e}")
+    def _initialize_checkpointer(self) -> None:
+        """Initialize SQLite checkpointer for short-term memory"""
+        try:
+            # Create a direct SQLite connection
+            self._sqlite_connection = sqlite3.connect(":memory:", check_same_thread=False)
+            self.checkpointer = SqliteSaver(self._sqlite_connection)
+            print("Checkpointer initialized successfully")
+        except Exception as e:
+            print(f"Warning: Could not initialize checkpointer: {e}")
+    def get_checkpointer(self) -> Optional[SqliteSaver]:
+        """Get the checkpointer instance"""
+        return self.checkpointer
+    def close_checkpointer(self) -> None:
+        """Close the checkpointer and its SQLite connection"""
+        if self._sqlite_connection:
+            try:
+                self._sqlite_connection.close()
+                print("SQLite connection closed")
+            except Exception as e:
+                print(f"Warning: Error closing SQLite connection: {e}")
+    def similarity_search(self, query: str, k: int = 2) -> List[Any]:
+        """Search for similar questions with caching"""
+        if not self.vector_store:
+            return []
+        # Check cache first
+        q_hash = hashlib.sha256(query.encode()).hexdigest()
+        now = time.time()
+        if q_hash in self.query_cache and now - self.query_cache[q_hash][0] < TTL:
+            print("Memory: Cache hit for similarity search")
+            return self.query_cache[q_hash][1]
+        try:
+            print("Memory: Searching vector store for similar questions...")
+            similar_questions = self.vector_store.similarity_search_with_relevance_scores(query, k=k)
+            self.query_cache[q_hash] = (now, similar_questions)
+            return similar_questions
+        except Exception as e:
+            print(f"Memory: Vector store search error – {e}")
+            return []
+    def should_ingest(self, query: str) -> bool:
+        """Determine if this query/answer should be ingested to long-term memory"""
+        if not self.vector_store:
+            return False
+        similar_questions = self.similarity_search(query, k=1)
+        top_score = similar_questions[0][1] if similar_questions else 0.0
+        return top_score < SIMILARITY_THRESHOLD
+    def ingest_qa_pair(self, question: str, answer: str, attachments: str = "") -> None:
+        """Store Q/A pair in long-term memory"""
+        if not self.vector_store:
+            print("Memory: Vector store not available for ingestion")
+            return
+        try:
+            payload = f"Question:\n{question}\n\nAnswer:\n{answer}"
+            if attachments:
+                payload += f"\n\n{attachments}"
+            hash_id = hashlib.sha256(payload.encode()).hexdigest()
+            if hash_id in self.seen_hashes:
+                print("Memory: Duplicate payload within session – skip")
+                return
+            self.seen_hashes.add(hash_id)
+            self.vector_store.add_texts(
+                [payload],
+                metadatas=[{"hash_id": hash_id, "timestamp": time.time()}]
+            )
+            print("Memory: Stored new Q/A pair in vector store")
+        except Exception as e:
+            print(f"Memory: Error while upserting – {e}")
+    def get_similar_qa(self, query: str) -> Optional[str]:
+        """Get similar Q/A for context"""
+        similar_questions = self.similarity_search(query, k=1)
+        if not similar_questions:
+            return None
+        example_doc = similar_questions[0][0] if isinstance(similar_questions[0], tuple) else similar_questions[0]
+        return example_doc.page_content
+    def add_processed_task(self, task_id: str) -> None:
+        """Mark a task as processed to avoid re-downloading attachments"""
+        self.processed_tasks.add(task_id)
+    def is_task_processed(self, task_id: str) -> bool:
+        """Check if a task has already been processed"""
+        return task_id in self.processed_tasks
+    def clear_session_cache(self) -> None:
+        """Clear session-specific caches"""
+        self.query_cache.clear()
+        self.processed_tasks.clear()
+        self.seen_hashes.clear()
+        print("Memory: Session cache cleared")
+# Global memory manager instance
+memory_manager = MemoryManager()

src/tracing.py ADDED Viewed

	@@ -0,0 +1,125 @@

+"""Tracing and Observability Setup for Langfuse v3.0.0"""
+import os
+from typing import Optional
+from langfuse import Langfuse, get_client
+from langfuse.langchain import CallbackHandler
+def initialize_langfuse() -> None:
+    """Initialize Langfuse client with proper configuration"""
+    try:
+        # Initialize Langfuse client
+        Langfuse(
+            public_key=os.environ.get("LANGFUSE_PUBLIC_KEY"),
+            secret_key=os.environ.get("LANGFUSE_SECRET_KEY"),
+            host=os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com")
+        )
+        print("Langfuse client initialized successfully")
+    except Exception as e:
+        print(f"Warning: Could not initialize Langfuse client: {e}")
+# Singleton for Langfuse CallbackHandler to ensure a single handler per request
+_CALLBACK_HANDLER: Optional[CallbackHandler] = None
+def get_langfuse_callback_handler() -> Optional[CallbackHandler]:
+    """Get (or create) a singleton Langfuse callback handler for LangChain integration
+    Best-practice (#2): Pass exactly **one** CallbackHandler into graph.invoke/stream so that
+    every nested LLM/tool span is correlated underneath the same root span. Re-using the
+    same instance avoids fragmenting traces when individual nodes try to create their own
+    handler.
+    """
+    global _CALLBACK_HANDLER  # noqa: PLW0603 – module-level singleton is intentional
+    try:
+        initialize_langfuse()
+        if _CALLBACK_HANDLER is None:
+            _CALLBACK_HANDLER = CallbackHandler()
+        return _CALLBACK_HANDLER
+    except Exception as e:
+        print(f"Warning: Could not create Langfuse callback handler: {e}")
+        return None
+def trace_agent_execution(name: str, user_id: str | None = None, session_id: str | None = None):
+    """Context manager that opens a **root** span for the current user request.
+    Follows Langfuse best practices (rules #2 & #3):
+        • exactly one root span per request
+        • attach `user_id` and `session_id` so that follow-up calls are stitched together
+    """
+    try:
+        langfuse = get_client()
+        span_kwargs = {"name": name}
+        # Open the span as context manager so everything inside is automatically nested
+        span_cm = langfuse.start_as_current_span(**span_kwargs)
+        # Wrap the CM so that we can update the trace metadata *after* it was started
+        class _TraceWrapper:
+            def __enter__(self):
+                # Enter the span
+                self._span = span_cm.__enter__()
+                # Immediately enrich it with session/user information
+                try:
+                    langfuse.update_current_trace(
+                        **{k: v for k, v in {"user_id": user_id, "session_id": session_id}.items() if v}
+                    )
+                except Exception:
+                    # Ignore update failures – tracing must never break business logic
+                    pass
+                return self._span
+            def __exit__(self, exc_type, exc_val, exc_tb):
+                return span_cm.__exit__(exc_type, exc_val, exc_tb)
+        return _TraceWrapper()
+    except Exception as e:
+        print(f"Warning: Could not create trace span: {e}")
+        # Gracefully degrade – return dummy context manager
+        from contextlib import nullcontext
+        return nullcontext()  # type: ignore
+def update_trace_metadata(user_id: str = None, session_id: str = None, tags: list = None, **kwargs):
+    """Update current trace with metadata"""
+    try:
+        langfuse = get_client()
+        update_args = {}
+        if user_id:
+            update_args["user_id"] = user_id
+        if session_id:
+            update_args["session_id"] = session_id
+        if tags:
+            update_args["tags"] = tags
+        if kwargs:
+            update_args.update(kwargs)
+        langfuse.update_current_trace(**update_args)
+    except Exception as e:
+        print(f"Warning: Could not update trace metadata: {e}")
+def flush_langfuse():
+    """Flush Langfuse events (for short-lived applications)"""
+    try:
+        langfuse = get_client()
+        langfuse.flush()
+    except Exception as e:
+        print(f"Warning: Could not flush Langfuse events: {e}")
+def shutdown_langfuse():
+    """Shutdown Langfuse client (for application cleanup)"""
+    try:
+        langfuse = get_client()
+        langfuse.shutdown()
+    except Exception as e:
+        print(f"Warning: Could not shutdown Langfuse client: {e}")
+# Initialize Langfuse on module import
+initialize_langfuse()

test_new_system.py ADDED Viewed

	@@ -0,0 +1,205 @@

+#!/usr/bin/env python3
+"""
+Test Script for New LangGraph Agent System
+Tests the multi-agent architecture with memory, routing, and verification.
+"""
+import os
+import sys
+import time
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+# Add the current directory to Python path
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+def test_imports():
+    """Test that all modules can be imported correctly"""
+    print("Testing imports...")
+    try:
+        # Test core imports
+        from src import run_agent_system, memory_manager
+        from src.tracing import get_langfuse_callback_handler
+        # Test agent imports
+        from src.agents import (
+            plan_node, router_node, retrieval_agent,
+            execution_agent, critic_agent, verification_node
+        )
+        print("✅ All imports successful")
+        return True
+    except ImportError as e:
+        print(f"❌ Import error: {e}")
+        return False
+def test_memory_system():
+    """Test the memory management system"""
+    print("\nTesting memory system...")
+    try:
+        from src.memory import memory_manager
+        # Test basic functionality
+        test_query = "What is 2+2?"
+        # Test similarity search (should not crash even without vector store)
+        similar = memory_manager.similarity_search(test_query, k=1)
+        print(f"✅ Similarity search completed: {len(similar)} results")
+        # Test cache management
+        memory_manager.clear_session_cache()
+        print("✅ Memory cache cleared")
+        return True
+    except Exception as e:
+        print(f"❌ Memory system error: {e}")
+        return False
+def test_tracing_system():
+    """Test the Langfuse tracing integration"""
+    print("\nTesting tracing system...")
+    try:
+        from src.tracing import get_langfuse_callback_handler, initialize_langfuse
+        # Test handler creation (should not crash even without credentials)
+        handler = get_langfuse_callback_handler()
+        print(f"✅ Langfuse handler: {type(handler)}")
+        return True
+    except Exception as e:
+        print(f"❌ Tracing system error: {e}")
+        return False
+def test_individual_agents():
+    """Test each agent individually"""
+    print("\nTesting individual agents...")
+    # Test state structure
+    test_state = {
+        "messages": [],
+        "plan_complete": False,
+        "next_agent": "",
+        "routing_decision": "",
+        "routing_reason": "",
+        "current_step": "testing",
+        "agent_response": None,
+        "needs_tools": False,
+        "execution_result": "",
+        "critic_assessment": "",
+        "quality_pass": True,
+        "quality_score": 7,
+        "verification_status": "",
+        "attempt_count": 1,
+        "final_answer": ""
+    }
+    try:
+        from langchain_core.messages import HumanMessage
+        test_state["messages"] = [HumanMessage(content="Test query")]
+        # Test plan node
+        from src.agents.plan_node import plan_node
+        plan_result = plan_node(test_state)
+        print("✅ Plan node executed")
+        # Test router node
+        from src.agents.router_node import router_node
+        router_result = router_node(plan_result)
+        print("✅ Router node executed")
+        return True
+    except Exception as e:
+        print(f"❌ Agent testing error: {e}")
+        return False
+def test_graph_creation():
+    """Test the main graph creation"""
+    print("\nTesting graph creation...")
+    try:
+        from src.langgraph_system import create_agent_graph
+        # Create the workflow
+        workflow = create_agent_graph()
+        print("✅ Graph created successfully")
+        # Try to compile (this might fail without proper setup, but shouldn't crash)
+        try:
+            app = workflow.compile()
+            print("✅ Graph compiled successfully")
+        except Exception as e:
+            print(f"⚠️ Graph compilation warning: {e}")
+        return True
+    except Exception as e:
+        print(f"❌ Graph creation error: {e}")
+        return False
+def test_simple_query():
+    """Test a simple query through the system"""
+    print("\nTesting simple query...")
+    try:
+        from new_langraph_agent import run_agent
+        # Simple test query
+        test_query = "What is 1 + 1?"
+        print(f"Query: {test_query}")
+        start_time = time.time()
+        result = run_agent(test_query)
+        end_time = time.time()
+        print(f"Result: {result}")
+        print(f"Time taken: {end_time - start_time:.2f} seconds")
+        print("✅ Simple query completed")
+        return True
+    except Exception as e:
+        print(f"❌ Simple query error: {e}")
+        return False
+def main():
+    """Run all tests"""
+    print("LangGraph Agent System - Test Suite")
+    print("=" * 50)
+    tests = [
+        test_imports,
+        test_memory_system,
+        test_tracing_system,
+        test_individual_agents,
+        test_graph_creation,
+        test_simple_query
+    ]
+    results = []
+    for test_func in tests:
+        try:
+            result = test_func()
+            results.append(result)
+        except Exception as e:
+            print(f"❌ Test {test_func.__name__} failed with exception: {e}")
+            results.append(False)
+    # Summary
+    print("\n" + "=" * 50)
+    print("Test Summary:")
+    print(f"Passed: {sum(results)}/{len(results)}")
+    print(f"Failed: {len(results) - sum(results)}/{len(results)}")
+    if all(results):
+        print("🎉 All tests passed!")
+        return 0
+    else:
+        print("⚠️ Some tests failed. Check the output above for details.")
+        return 1
+if __name__ == "__main__":
+    sys.exit(main())

test_tools_integration.py ADDED Viewed

	@@ -0,0 +1,81 @@

+#!/usr/bin/env python3
+"""
+Test script to verify tool integration in the LangGraph agent system
+"""
+from src.langgraph_system import run_agent_system
+def test_retrieval_tools():
+    """Test that retrieval tools (Wikipedia, web search, etc.) are working"""
+    print("=" * 60)
+    print("Testing Retrieval Tools Integration")
+    print("=" * 60)
+    # Test Wikipedia search
+    query = "When was Albert Einstein born?"
+    print(f"\nTesting query: {query}")
+    print("-" * 40)
+    result = run_agent_system(query, user_id="test_user", session_id="test_session")
+    print(f"Result: {result}")
+    return result
+def test_execution_tools():
+    """Test that execution tools (Python code execution) are working"""
+    print("=" * 60)
+    print("Testing Execution Tools Integration")
+    print("=" * 60)
+    # Test code execution
+    query = "Calculate the first 10 numbers in the Fibonacci sequence"
+    print(f"\nTesting query: {query}")
+    print("-" * 40)
+    result = run_agent_system(query, user_id="test_user", session_id="test_session")
+    print(f"Result: {result}")
+    return result
+def test_web_search_tools():
+    """Test web search functionality"""
+    print("=" * 60)
+    print("Testing Web Search Tools Integration")
+    print("=" * 60)
+    # Test web search
+    query = "What is the latest news about artificial intelligence?"
+    print(f"\nTesting query: {query}")
+    print("-" * 40)
+    result = run_agent_system(query, user_id="test_user", session_id="test_session")
+    print(f"Result: {result}")
+    return result
+if __name__ == "__main__":
+    print("Starting Tool Integration Tests...")
+    try:
+        # Test retrieval tools
+        test_retrieval_tools()
+        print("\n" + "=" * 60)
+        input("Press Enter to continue to execution tools test...")
+        # Test execution tools
+        test_execution_tools()
+        print("\n" + "=" * 60)
+        input("Press Enter to continue to web search tools test...")
+        # Test web search tools
+        test_web_search_tools()
+        print("\n" + "=" * 60)
+        print("Tool integration tests completed!")
+    except Exception as e:
+        print(f"Test failed with error: {e}")
+        import traceback
+        traceback.print_exc()

uv.lock CHANGED Viewed

@@ -331,6 +331,30 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/c3/be/d0d44e092656fe7a06b55e6103cbce807cdbdee17884a5367c68c9860853/dataclasses_json-0.6.7-py3-none-any.whl", hash = "sha256:0dbf33f26c8d5305befd61b39d2b3414e8a407bedc2834dea9b8d642666fb40a", size = 28686 },
 ]
 [[package]]
 name = "debugpy"
 version = "1.8.14"
@@ -386,6 +410,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/02/c3/253a89ee03fc9b9682f1541728eb66db7db22148cd94f89ab22528cd1e1b/deprecation-2.1.0-py2.py3-none-any.whl", hash = "sha256:a10811591210e1fb0e768a8c25517cabeabcba6f0bf96564f8ff45189f90b14a", size = 11178 },
 ]
 [[package]]
 name = "dirtyjson"
 version = "1.0.8"
@@ -495,6 +528,7 @@ name = "final-assignment-template"
 version = "0.1.0"
 source = { virtual = "." }
 dependencies = [
     { name = "dotenv" },
     { name = "gradio" },
     { name = "hf-xet" },
@@ -509,6 +543,8 @@ dependencies = [
     { name = "langchain-openai" },
     { name = "langfuse" },
     { name = "langgraph" },
     { name = "llama-index" },
     { name = "llama-index-core" },
     { name = "llama-index-llms-huggingface-api" },
@@ -526,6 +562,7 @@ dependencies = [
 [package.metadata]
 requires-dist = [
     { name = "dotenv", specifier = ">=0.9.9" },
     { name = "gradio", specifier = ">=5.34.1" },
     { name = "hf-xet", specifier = ">=1.1.3" },
@@ -540,6 +577,8 @@ requires-dist = [
     { name = "langchain-openai", specifier = ">=0.3.24" },
     { name = "langfuse", specifier = ">=3.0.0" },
     { name = "langgraph", specifier = ">=0.4.8" },
     { name = "llama-index", specifier = ">=0.12.40" },
     { name = "llama-index-core", specifier = ">=0.12.40" },
     { name = "llama-index-llms-huggingface-api", specifier = ">=0.5.0" },
@@ -600,11 +639,16 @@ wheels = [
 [[package]]
 name = "fsspec"
-version = "2025.5.1"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/00/f7/27f15d41f0ed38e8fcc488584b57e902b331da7f7c6dcda53721b15838fc/fsspec-2025.5.1.tar.gz", hash = "sha256:2e55e47a540b91843b755e83ded97c6e897fa0942b11490113f09e9c443c2475", size = 303033 }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/bb/61/78c7b3851add1481b048b5fdc29067397a1784e2910592bc81bb3f608635/fsspec-2025.5.1-py3-none-any.whl", hash = "sha256:24d3a2e663d5fc735ab256263c4075f374a174c3410c0b25e5bd1970bceaa462", size = 199052 },
 ]
 [[package]]
@@ -1366,6 +1410,20 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/0f/41/390a97d9d0abe5b71eea2f6fb618d8adadefa674e97f837bae6cda670bc7/langgraph_checkpoint-2.1.0-py3-none-any.whl", hash = "sha256:4cea3e512081da1241396a519cbfe4c5d92836545e2c64e85b6f5c34a1b8bc61", size = 43844 },
 ]
 [[package]]
 name = "langgraph-prebuilt"
 version = "0.2.2"
@@ -1838,6 +1896,22 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/84/5d/e17845bb0fa76334477d5de38654d27946d5b5d3695443987a094a71b440/multidict-6.4.4-py3-none-any.whl", hash = "sha256:bd4557071b561a8b3b6075c3ce93cf9bfb6182cb241805c3d66ced3b75eff4ac", size = 10481 },
 ]
 [[package]]
 name = "mypy-extensions"
 version = "1.1.0"
@@ -2476,6 +2550,32 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/8e/37/efad0257dc6e593a18957422533ff0f87ede7c9c6ea010a2177d738fb82f/pure_eval-0.2.3-py3-none-any.whl", hash = "sha256:1db8e35b67b3d218d818ae653e27f06c3aa420901fa7b081ca98cbedc874e0d0", size = 11842 },
 ]
 [[package]]
 name = "pyasn1"
 version = "0.6.1"
@@ -3029,6 +3129,18 @@ asyncio = [
     { name = "greenlet" },
 ]
 [[package]]
 name = "stack-data"
 version = "0.6.3"

     { url = "https://files.pythonhosted.org/packages/c3/be/d0d44e092656fe7a06b55e6103cbce807cdbdee17884a5367c68c9860853/dataclasses_json-0.6.7-py3-none-any.whl", hash = "sha256:0dbf33f26c8d5305befd61b39d2b3414e8a407bedc2834dea9b8d642666fb40a", size = 28686 },
 ]
+[[package]]
+name = "datasets"
+version = "3.6.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "dill" },
+    { name = "filelock" },
+    { name = "fsspec", extra = ["http"] },
+    { name = "huggingface-hub" },
+    { name = "multiprocess" },
+    { name = "numpy" },
+    { name = "packaging" },
+    { name = "pandas" },
+    { name = "pyarrow" },
+    { name = "pyyaml" },
+    { name = "requests" },
+    { name = "tqdm" },
+    { name = "xxhash" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/1a/89/d3d6fef58a488f8569c82fd293ab7cbd4250244d67f425dcae64c63800ea/datasets-3.6.0.tar.gz", hash = "sha256:1b2bf43b19776e2787e181cfd329cb0ca1a358ea014780c3581e0f276375e041", size = 569336 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/20/34/a08b0ee99715eaba118cbe19a71f7b5e2425c2718ef96007c325944a1152/datasets-3.6.0-py3-none-any.whl", hash = "sha256:25000c4a2c0873a710df127d08a202a06eab7bf42441a6bc278b499c2f72cd1b", size = 491546 },
+]
 [[package]]
 name = "debugpy"
 version = "1.8.14"
     { url = "https://files.pythonhosted.org/packages/02/c3/253a89ee03fc9b9682f1541728eb66db7db22148cd94f89ab22528cd1e1b/deprecation-2.1.0-py2.py3-none-any.whl", hash = "sha256:a10811591210e1fb0e768a8c25517cabeabcba6f0bf96564f8ff45189f90b14a", size = 11178 },
 ]
+[[package]]
+name = "dill"
+version = "0.3.8"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/17/4d/ac7ffa80c69ea1df30a8aa11b3578692a5118e7cd1aa157e3ef73b092d15/dill-0.3.8.tar.gz", hash = "sha256:3ebe3c479ad625c4553aca177444d89b486b1d84982eeacded644afc0cf797ca", size = 184847 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c9/7a/cef76fd8438a42f96db64ddaa85280485a9c395e7df3db8158cfec1eee34/dill-0.3.8-py3-none-any.whl", hash = "sha256:c36ca9ffb54365bdd2f8eb3eff7d2a21237f8452b57ace88b1ac615b7e815bd7", size = 116252 },
+]
 [[package]]
 name = "dirtyjson"
 version = "1.0.8"
 version = "0.1.0"
 source = { virtual = "." }
 dependencies = [
+    { name = "datasets" },
     { name = "dotenv" },
     { name = "gradio" },
     { name = "hf-xet" },
     { name = "langchain-openai" },
     { name = "langfuse" },
     { name = "langgraph" },
+    { name = "langgraph-checkpoint" },
+    { name = "langgraph-checkpoint-sqlite" },
     { name = "llama-index" },
     { name = "llama-index-core" },
     { name = "llama-index-llms-huggingface-api" },
 [package.metadata]
 requires-dist = [
+    { name = "datasets", specifier = ">=2.19.1" },
     { name = "dotenv", specifier = ">=0.9.9" },
     { name = "gradio", specifier = ">=5.34.1" },
     { name = "hf-xet", specifier = ">=1.1.3" },
     { name = "langchain-openai", specifier = ">=0.3.24" },
     { name = "langfuse", specifier = ">=3.0.0" },
     { name = "langgraph", specifier = ">=0.4.8" },
+    { name = "langgraph-checkpoint", specifier = ">=2.1.0" },
+    { name = "langgraph-checkpoint-sqlite", specifier = ">=2.0.10" },
     { name = "llama-index", specifier = ">=0.12.40" },
     { name = "llama-index-core", specifier = ">=0.12.40" },
     { name = "llama-index-llms-huggingface-api", specifier = ">=0.5.0" },
 [[package]]
 name = "fsspec"
+version = "2025.3.0"
 source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/34/f4/5721faf47b8c499e776bc34c6a8fc17efdf7fdef0b00f398128bc5dcb4ac/fsspec-2025.3.0.tar.gz", hash = "sha256:a935fd1ea872591f2b5148907d103488fc523295e6c64b835cfad8c3eca44972", size = 298491 }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/56/53/eb690efa8513166adef3e0669afd31e95ffde69fb3c52ec2ac7223ed6018/fsspec-2025.3.0-py3-none-any.whl", hash = "sha256:efb87af3efa9103f94ca91a7f8cb7a4df91af9f74fc106c9c7ea0efd7277c1b3", size = 193615 },
+]
+[package.optional-dependencies]
+http = [
+    { name = "aiohttp" },
 ]
 [[package]]
     { url = "https://files.pythonhosted.org/packages/0f/41/390a97d9d0abe5b71eea2f6fb618d8adadefa674e97f837bae6cda670bc7/langgraph_checkpoint-2.1.0-py3-none-any.whl", hash = "sha256:4cea3e512081da1241396a519cbfe4c5d92836545e2c64e85b6f5c34a1b8bc61", size = 43844 },
 ]
+[[package]]
+name = "langgraph-checkpoint-sqlite"
+version = "2.0.10"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "aiosqlite" },
+    { name = "langgraph-checkpoint" },
+    { name = "sqlite-vec" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/7b/38/5d44b91fa21e06309be8f1658ae966f5c717443401df005b20d9af91b6b5/langgraph_checkpoint_sqlite-2.0.10.tar.gz", hash = "sha256:c8a55a268b857761dc77f123df48addaf8e9a40b72c4eaddb7c551ddced1c5b6", size = 103625 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c1/ff/63b16d83a513f7d7a5001bb01a40024986d330718a5315bf1962d7cc50c8/langgraph_checkpoint_sqlite-2.0.10-py3-none-any.whl", hash = "sha256:89d1d2201fe26aa52f1a9c03e1015d226635649be596b26542a5de78f8cc6c9f", size = 30973 },
+]
 [[package]]
 name = "langgraph-prebuilt"
 version = "0.2.2"
     { url = "https://files.pythonhosted.org/packages/84/5d/e17845bb0fa76334477d5de38654d27946d5b5d3695443987a094a71b440/multidict-6.4.4-py3-none-any.whl", hash = "sha256:bd4557071b561a8b3b6075c3ce93cf9bfb6182cb241805c3d66ced3b75eff4ac", size = 10481 },
 ]
+[[package]]
+name = "multiprocess"
+version = "0.70.16"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "dill" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b5/ae/04f39c5d0d0def03247c2893d6f2b83c136bf3320a2154d7b8858f2ba72d/multiprocess-0.70.16.tar.gz", hash = "sha256:161af703d4652a0e1410be6abccecde4a7ddffd19341be0a7011b94aeb171ac1", size = 1772603 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/bc/f7/7ec7fddc92e50714ea3745631f79bd9c96424cb2702632521028e57d3a36/multiprocess-0.70.16-py310-none-any.whl", hash = "sha256:c4a9944c67bd49f823687463660a2d6daae94c289adff97e0f9d696ba6371d02", size = 134824 },
+    { url = "https://files.pythonhosted.org/packages/50/15/b56e50e8debaf439f44befec5b2af11db85f6e0f344c3113ae0be0593a91/multiprocess-0.70.16-py311-none-any.whl", hash = "sha256:af4cabb0dac72abfb1e794fa7855c325fd2b55a10a44628a3c1ad3311c04127a", size = 143519 },
+    { url = "https://files.pythonhosted.org/packages/0a/7d/a988f258104dcd2ccf1ed40fdc97e26c4ac351eeaf81d76e266c52d84e2f/multiprocess-0.70.16-py312-none-any.whl", hash = "sha256:fc0544c531920dde3b00c29863377f87e1632601092ea2daca74e4beb40faa2e", size = 146741 },
+    { url = "https://files.pythonhosted.org/packages/ea/89/38df130f2c799090c978b366cfdf5b96d08de5b29a4a293df7f7429fa50b/multiprocess-0.70.16-py38-none-any.whl", hash = "sha256:a71d82033454891091a226dfc319d0cfa8019a4e888ef9ca910372a446de4435", size = 132628 },
+    { url = "https://files.pythonhosted.org/packages/da/d9/f7f9379981e39b8c2511c9e0326d212accacb82f12fbfdc1aa2ce2a7b2b6/multiprocess-0.70.16-py39-none-any.whl", hash = "sha256:a0bafd3ae1b732eac64be2e72038231c1ba97724b60b09400d68f229fcc2fbf3", size = 133351 },
+]
 [[package]]
 name = "mypy-extensions"
 version = "1.1.0"
     { url = "https://files.pythonhosted.org/packages/8e/37/efad0257dc6e593a18957422533ff0f87ede7c9c6ea010a2177d738fb82f/pure_eval-0.2.3-py3-none-any.whl", hash = "sha256:1db8e35b67b3d218d818ae653e27f06c3aa420901fa7b081ca98cbedc874e0d0", size = 11842 },
 ]
+[[package]]
+name = "pyarrow"
+version = "20.0.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/a2/ee/a7810cb9f3d6e9238e61d312076a9859bf3668fd21c69744de9532383912/pyarrow-20.0.0.tar.gz", hash = "sha256:febc4a913592573c8d5805091a6c2b5064c8bd6e002131f01061797d91c783c1", size = 1125187 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/9b/aa/daa413b81446d20d4dad2944110dcf4cf4f4179ef7f685dd5a6d7570dc8e/pyarrow-20.0.0-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:a15532e77b94c61efadde86d10957950392999503b3616b2ffcef7621a002893", size = 30798501 },
+    { url = "https://files.pythonhosted.org/packages/ff/75/2303d1caa410925de902d32ac215dc80a7ce7dd8dfe95358c165f2adf107/pyarrow-20.0.0-cp313-cp313-macosx_12_0_x86_64.whl", hash = "sha256:dd43f58037443af715f34f1322c782ec463a3c8a94a85fdb2d987ceb5658e061", size = 32277895 },
+    { url = "https://files.pythonhosted.org/packages/92/41/fe18c7c0b38b20811b73d1bdd54b1fccba0dab0e51d2048878042d84afa8/pyarrow-20.0.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:aa0d288143a8585806e3cc7c39566407aab646fb9ece164609dac1cfff45f6ae", size = 41327322 },
+    { url = "https://files.pythonhosted.org/packages/da/ab/7dbf3d11db67c72dbf36ae63dcbc9f30b866c153b3a22ef728523943eee6/pyarrow-20.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b6953f0114f8d6f3d905d98e987d0924dabce59c3cda380bdfaa25a6201563b4", size = 42411441 },
+    { url = "https://files.pythonhosted.org/packages/90/c3/0c7da7b6dac863af75b64e2f827e4742161128c350bfe7955b426484e226/pyarrow-20.0.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:991f85b48a8a5e839b2128590ce07611fae48a904cae6cab1f089c5955b57eb5", size = 40677027 },
+    { url = "https://files.pythonhosted.org/packages/be/27/43a47fa0ff9053ab5203bb3faeec435d43c0d8bfa40179bfd076cdbd4e1c/pyarrow-20.0.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:97c8dc984ed09cb07d618d57d8d4b67a5100a30c3818c2fb0b04599f0da2de7b", size = 42281473 },
+    { url = "https://files.pythonhosted.org/packages/bc/0b/d56c63b078876da81bbb9ba695a596eabee9b085555ed12bf6eb3b7cab0e/pyarrow-20.0.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:9b71daf534f4745818f96c214dbc1e6124d7daf059167330b610fc69b6f3d3e3", size = 42893897 },
+    { url = "https://files.pythonhosted.org/packages/92/ac/7d4bd020ba9145f354012838692d48300c1b8fe5634bfda886abcada67ed/pyarrow-20.0.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:e8b88758f9303fa5a83d6c90e176714b2fd3852e776fc2d7e42a22dd6c2fb368", size = 44543847 },
+    { url = "https://files.pythonhosted.org/packages/9d/07/290f4abf9ca702c5df7b47739c1b2c83588641ddfa2cc75e34a301d42e55/pyarrow-20.0.0-cp313-cp313-win_amd64.whl", hash = "sha256:30b3051b7975801c1e1d387e17c588d8ab05ced9b1e14eec57915f79869b5031", size = 25653219 },
+    { url = "https://files.pythonhosted.org/packages/95/df/720bb17704b10bd69dde086e1400b8eefb8f58df3f8ac9cff6c425bf57f1/pyarrow-20.0.0-cp313-cp313t-macosx_12_0_arm64.whl", hash = "sha256:ca151afa4f9b7bc45bcc791eb9a89e90a9eb2772767d0b1e5389609c7d03db63", size = 30853957 },
+    { url = "https://files.pythonhosted.org/packages/d9/72/0d5f875efc31baef742ba55a00a25213a19ea64d7176e0fe001c5d8b6e9a/pyarrow-20.0.0-cp313-cp313t-macosx_12_0_x86_64.whl", hash = "sha256:4680f01ecd86e0dd63e39eb5cd59ef9ff24a9d166db328679e36c108dc993d4c", size = 32247972 },
+    { url = "https://files.pythonhosted.org/packages/d5/bc/e48b4fa544d2eea72f7844180eb77f83f2030b84c8dad860f199f94307ed/pyarrow-20.0.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7f4c8534e2ff059765647aa69b75d6543f9fef59e2cd4c6d18015192565d2b70", size = 41256434 },
+    { url = "https://files.pythonhosted.org/packages/c3/01/974043a29874aa2cf4f87fb07fd108828fc7362300265a2a64a94965e35b/pyarrow-20.0.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3e1f8a47f4b4ae4c69c4d702cfbdfe4d41e18e5c7ef6f1bb1c50918c1e81c57b", size = 42353648 },
+    { url = "https://files.pythonhosted.org/packages/68/95/cc0d3634cde9ca69b0e51cbe830d8915ea32dda2157560dda27ff3b3337b/pyarrow-20.0.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:a1f60dc14658efaa927f8214734f6a01a806d7690be4b3232ba526836d216122", size = 40619853 },
+    { url = "https://files.pythonhosted.org/packages/29/c2/3ad40e07e96a3e74e7ed7cc8285aadfa84eb848a798c98ec0ad009eb6bcc/pyarrow-20.0.0-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:204a846dca751428991346976b914d6d2a82ae5b8316a6ed99789ebf976551e6", size = 42241743 },
+    { url = "https://files.pythonhosted.org/packages/eb/cb/65fa110b483339add6a9bc7b6373614166b14e20375d4daa73483755f830/pyarrow-20.0.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:f3b117b922af5e4c6b9a9115825726cac7d8b1421c37c2b5e24fbacc8930612c", size = 42839441 },
+    { url = "https://files.pythonhosted.org/packages/98/7b/f30b1954589243207d7a0fbc9997401044bf9a033eec78f6cb50da3f304a/pyarrow-20.0.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:e724a3fd23ae5b9c010e7be857f4405ed5e679db5c93e66204db1a69f733936a", size = 44503279 },
+    { url = "https://files.pythonhosted.org/packages/37/40/ad395740cd641869a13bcf60851296c89624662575621968dcfafabaa7f6/pyarrow-20.0.0-cp313-cp313t-win_amd64.whl", hash = "sha256:82f1ee5133bd8f49d31be1299dc07f585136679666b502540db854968576faf9", size = 25944982 },
+]
 [[package]]
 name = "pyasn1"
 version = "0.6.1"
     { name = "greenlet" },
 ]
+[[package]]
+name = "sqlite-vec"
+version = "0.1.6"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/88/ed/aabc328f29ee6814033d008ec43e44f2c595447d9cccd5f2aabe60df2933/sqlite_vec-0.1.6-py3-none-macosx_10_6_x86_64.whl", hash = "sha256:77491bcaa6d496f2acb5cc0d0ff0b8964434f141523c121e313f9a7d8088dee3", size = 164075 },
+    { url = "https://files.pythonhosted.org/packages/a7/57/05604e509a129b22e303758bfa062c19afb020557d5e19b008c64016704e/sqlite_vec-0.1.6-py3-none-macosx_11_0_arm64.whl", hash = "sha256:fdca35f7ee3243668a055255d4dee4dea7eed5a06da8cad409f89facf4595361", size = 165242 },
+    { url = "https://files.pythonhosted.org/packages/f2/48/dbb2cc4e5bad88c89c7bb296e2d0a8df58aab9edc75853728c361eefc24f/sqlite_vec-0.1.6-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7b0519d9cd96164cd2e08e8eed225197f9cd2f0be82cb04567692a0a4be02da3", size = 103704 },
+    { url = "https://files.pythonhosted.org/packages/80/76/97f33b1a2446f6ae55e59b33869bed4eafaf59b7f4c662c8d9491b6a714a/sqlite_vec-0.1.6-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux1_x86_64.whl", hash = "sha256:823b0493add80d7fe82ab0fe25df7c0703f4752941aee1c7b2b02cec9656cb24", size = 151556 },
+    { url = "https://files.pythonhosted.org/packages/6a/98/e8bc58b178266eae2fcf4c9c7a8303a8d41164d781b32d71097924a6bebe/sqlite_vec-0.1.6-py3-none-win_amd64.whl", hash = "sha256:c65bcfd90fa2f41f9000052bcb8bb75d38240b2dae49225389eca6c3136d3f0c", size = 281540 },
+]
 [[package]]
 name = "stack-data"
 version = "0.6.3"