deep-research / README.md
mishrabp's picture
Upload folder using huggingface_hub
e9c7617 verified
metadata
title: AI Deep Researcher
emoji: πŸ€–
colorFrom: indigo
colorTo: blue
sdk: docker
sdk_version: 4.39.0
app_file: ui/app.py
pinned: false

AI Deep Researcher

AI Deep Researcher is a generative AI learning project built using the OpenAI Agentic Framework. This app performs deep-level web research based on user queries and generates a well-structured, consolidated report.

To achieve this, the project integrates the following technologies and AI features:

How it works?

The system is a multi-agent solution, where each agent has a specific responsibility:

  1. Planner Agent

    • Receives the user query and builds a structured query plan.
  2. Guardrail Agent

    • Validates user input and ensures compliance.
    • Stops the workflow if the input contains inappropriate or unparliamentary words.
  3. Search Agent

    • Executes the query plan.
    • Runs multiple web searches in parallel to gather data.
  4. Writer Agent

    • Reads results from all search agents.
    • Generates a well-formatted, consolidated report.
  5. Email Agent (not functional at present)

    • Responsible for sending the report via email using SendGrid.
  6. Orchestrator

    • The entry point of the system.
    • Facilitates communication and workflow between all agents.

Project Folder Structure

deep-research/
β”œβ”€β”€ ui/
β”‚   β”œβ”€β”€ app.py                    # Main Streamlit application entry point
β”‚   └── __pycache__/              # Python bytecode cache
β”œβ”€β”€ appagents/
β”‚   β”œβ”€β”€ __init__.py               # Package initialization
β”‚   β”œβ”€β”€ orchestrator.py           # Orchestrator agent - coordinates all agents
β”‚   β”œβ”€β”€ planner_agent.py          # Planner agent - builds structured query plans
β”‚   β”œβ”€β”€ guardrail_agent.py        # Guardrail agent - validates user input
β”‚   β”œβ”€β”€ search_agent.py           # Search agent - performs web searches
β”‚   β”œβ”€β”€ writer_agent.py           # Writer agent - generates consolidated reports
β”‚   β”œβ”€β”€ email_agent.py            # Email agent - sends reports via email (not functional)
β”‚   └── __pycache__/              # Python bytecode cache
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ __init__.py               # Package initialization
β”‚   β”œβ”€β”€ logger.py                 # Centralized logging configuration
β”‚   └── __pycache__/              # Python bytecode cache
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ __init__.py               # Package initialization
β”‚   β”œβ”€β”€ google_tools.py           # Google search utilities
β”‚   β”œβ”€β”€ time_tools.py             # Time-related utility functions
β”‚   └── __pycache__/              # Python bytecode cache
β”œβ”€β”€ prompts/
β”‚   β”œβ”€β”€ __init__.py               # Package initialization (if present)
β”‚   β”œβ”€β”€ planner_prompt.txt        # Prompt for planner agent (if present)
β”‚   β”œβ”€β”€ guardrail_prompt.txt      # Prompt for guardrail agent (if present)
β”‚   β”œβ”€β”€ search_prompt.txt         # Prompt for search agent (if present)
β”‚   └── writer_prompt.txt         # Prompt for writer agent (if present)
β”œβ”€β”€ Dockerfile                     # Docker configuration for container deployment
β”œβ”€β”€ pyproject.toml                 # Project metadata and dependencies (copied from root)
β”œβ”€β”€ uv.lock                        # Locked dependency versions (copied from root)
β”œβ”€β”€ README.md                      # Project documentation
└── run.py                         # Script to run the application locally (if present)

File Descriptions

UI Layer (ui/)

  • app.py - Main Streamlit web application that provides the user interface. Handles:
    • Text input for research queries
    • Run/Download buttons (PDF, Markdown)
    • Real-time streaming of results
    • Display of final research reports
    • Session state management
    • Button enable/disable during streaming

Agents (appagents/)

  • orchestrator.py - Central coordinator that:

    • Manages the multi-agent workflow
    • Handles communication between all agents
    • Streams results back to the UI
    • Implements the research pipeline
  • planner_agent.py - Creates a structured plan for the query:

    • Breaks down user query into actionable research steps
    • Defines search queries and research angles
  • guardrail_agent.py - Validates user input:

    • Checks for inappropriate content
    • Ensures compliance with policies
    • Stops workflow if violations detected
  • search_agent.py - Executes web searches:

    • Performs parallel web searches
    • Integrates with Google Search / Serper API
    • Gathers raw research data
  • writer_agent.py - Generates final report:

    • Consolidates search results
    • Formats findings into structured markdown
    • Creates well-organized research summaries
  • email_agent.py - Email delivery (not functional):

    • Intended to send reports via SendGrid
    • Currently not integrated in the workflow

Core Utilities (core/)

  • logger.py - Centralized logging configuration:
    • Provides consistent logging across agents
    • Handles log levels and formatting

Tools (tools/)

  • google_tools.py - Google/Serper API wrapper:

    • Executes web searches
    • Handles API authentication and response parsing
  • time_tools.py - Utility functions:

    • Time-related operations
    • Timestamp management

Configuration Files

  • Dockerfile - Container deployment:

    • Builds Docker image with Python 3.12
    • Installs dependencies using uv
    • Sets up Streamlit server on port 7860
    • Configures PYTHONPATH for module imports
  • pyproject.toml - Project metadata:

    • Package name: "agents"
    • Python version requirement: 3.12
    • Lists all dependencies (OpenAI, LangChain, Streamlit, etc.)
  • uv.lock - Dependency lock file:

    • Ensures reproducible builds
    • Pins exact versions of all dependencies

Key Technologies

Component Technology Purpose
LLM Framework OpenAI Agents Multi-agent orchestration
Web Search Serper API / Google Search Research data gathering
Web UI Streamlit User interface and interaction
Document Export ReportLab PDF generation from markdown
Async Operations AsyncIO Parallel agent execution
Dependencies UV Fast Python package management
Containerization Docker Cloud deployment

Running Locally

# Install dependencies
uv sync

# Set environment variables defined in .env.name file
export OPENAI_API_KEY="your-key"
export SERPER_API_KEY="your-key"

# Run the Streamlit app
python run.py

Deployment

The project is deployed on Hugging Face Spaces as a Docker container: