Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.29.1
Current GAIA Multi-Agent Framework Architecture
This document summarizes the architecture of the GAIA multi-agent framework based on the provided Python source files.
Core Framework
- Technology: The system is built using the
llama_index.core.agent.workflow.AgentWorkflow
from the LlamaIndex library. - Orchestration:
app.py
serves as the main entry point. It initializes a Gradio web interface, fetches benchmark questions from a specified API endpoint, manages file handling (text, image, audio) associated with questions, runs the agent workflow for each question, and submits the answers back to the API. - Root Agent: The workflow designates
planner_agent
as theroot_agent
, meaning it receives the initial user request (question) and orchestrates the subsequent steps.
Agent Roster and Capabilities
The framework comprises several specialized agents, each designed for specific tasks:
planner_agent
(Root):- Purpose: Strategic planning, task decomposition, and final synthesis.
- Tools:
generate_substeps
(breaks down objectives using an LLM),synthesize_and_respond
(aggregates results into a final report using an LLM). - Workflow: Receives the initial objective, breaks it into sub-steps, delegates these steps to appropriate specialist agents, and finally synthesizes the collected results into a coherent answer.
- Handoffs: Can delegate to
code_agent
,research_agent
,math_agent
,role_agent
,image_analyzer_agent
,text_analyzer_agent
,verifier_agent
,reasoning_agent
.
role_agent
:- Purpose: Determines and sets the appropriate persona or context for the task.
- Tools:
role_prompt_retriever
(uses a combination of vector search and BM25 retrieval on thefka/awesome-chatgpt-prompts
dataset, followed by reranking, to find the best role/prompt). - Workflow: Interprets user intent, retrieves relevant role descriptions, selects the best fit, and provides the role/prompt.
- Handoffs: Hands off to
planner_agent
after setting the role.
code_agent
:- Purpose: Generates and executes Python code.
- Tools:
python_code_generator
(uses an OpenAI modelo4-mini
to generate code from a prompt),code_interpreter
(uses LlamaIndex's tool spec, likely for sandboxed execution), and a customSimpleCodeExecutor
(executes Python code viasubprocess
, not safe for production). - Workflow: Takes a description, generates code, executes/tests it, and returns the result or final code.
- Handoffs: Hands off to
planner_agent
orreasoning_agent
.
math_agent
:- Purpose: Performs mathematical computations.
- Tools: A large suite of functions covering symbolic math (SymPy), matrix operations (NumPy), statistics (NumPy), numerical methods (NumPy, SciPy), vector math (NumPy), probability (SciPy), and potentially more (file was truncated). Also includes WolframAlpha integration.
- Workflow: Executes specific mathematical operations based on requests.
- Handoffs: (Inferred) Likely hands off to
planner_agent
orreasoning_agent
.
research_agent
:- Purpose: Gathers information from the web and specialized sources.
- Tools: Web search (Google, DuckDuckGo, Tavily), web browsing/interaction (Helium/Selenium:
visit
,get_text_by_css
,get_page_html
,click_element
,search_item_ctrl_f
,go_back
,close_popups
), Wikipedia search/loading, Yahoo Finance data retrieval, ArXiv paper search. - Workflow: Executes a plan-act-observe loop to find and extract information from various online sources.
- Handoffs: Can delegate to
code_agent
,math_agent
,analyzer_agent
(likely meanttext_analyzer_agent
orimage_analyzer_agent
),planner_agent
,reasoning_agent
.
text_analyzer_agent
:- Purpose: Extracts text from PDFs and analyzes text content.
- Tools:
extract_text_from_pdf
(uses PyPDF2, handles URLs and local files),analyze_text
(uses an LLM to generate summary and key facts). - Workflow: If input is PDF, extracts text; then analyzes the text to produce a summary and list of facts.
- Handoffs: Hands off to
verifier_agent
.
image_analyzer_agent
:- Purpose: Analyzes image content factually.
- Tools: Relies directly on the multimodal capabilities of its underlying LLM (Gemini 1.5 Pro) to process image inputs provided via
ChatMessage
blocks. No specific image analysis tool is defined, but the system prompt dictates a detailed, structured analysis format. - Workflow: Receives an image, performs analysis according to a strict factual template.
- Handoffs: Hands off to
planner_agent
,research_agent
, orreasoning_agent
.
verifier_agent
:- Purpose: Assesses the confidence of factual statements and detects contradictions.
- Tools:
verify_facts
(uses an LLM - Gemini 2.0 Flash - to assign confidence scores),find_contradictions
(uses simple string matching for negation pairs). - Workflow: Takes a list of facts, scores them, checks for contradictions, and reports results.
- Handoffs: Hands off to
reasoning_agent
orplanner_agent
.
reasoning_agent
:- Purpose: Performs explicit chain-of-thought reasoning.
- Tools:
reasoning_tool
(uses an OpenAI modelo4-mini
with a detailed prompt to perform CoT reasoning over the provided context). - Workflow: Takes context, applies reasoning via the tool, and provides the structured reasoning output.
- Handoffs: Hands off to
planner_agent
.
Workflow and Data Flow
- A question (potentially with associated files) arrives at
app.py
. app.py
formats the input (e.g.,ChatMessage
withTextBlock
,ImageBlock
,AudioBlock
) and passes it to theAgentWorkflow
starting withplanner_agent
.planner_agent
breaks down the task.- It may call
role_agent
to set context. - It delegates sub-tasks to specialized agents (
research
,code
,math
,text_analyzer
,image_analyzer
). - Agents execute their tasks, potentially calling tools or other agents (e.g.,
text_analyzer
callsverifier_agent
). reasoning_agent
might be called for complex logical steps or verification.- Results flow back up, eventually reaching
planner_agent
. planner_agent
synthesizes the final answer usingsynthesize_and_respond
.app.py
receives the final answer and submits it.
Technology Stack Summary
- Core: Python, LlamaIndex
- LLMs: Google Gemini (1.5 Pro, 2.0 Flash), OpenAI (o4-mini)
- UI: Gradio
- Web Interaction: Selenium, Helium
- Data Handling: Pandas, PyPDF2, Requests
- Search/Retrieval: HuggingFace Embeddings/Rerankers, Datasets, LlamaIndex Tool Specs (Google, Tavily, Wikipedia, DuckDuckGo, Yahoo Finance, ArXiv)
- Math: SymPy, NumPy, SciPy, WolframAlpha
- Code Execution: Subprocess (basic executor), LlamaIndex Code Interpreter