Spaces:
Running
title: AI Deep Researcher
emoji: π€
colorFrom: indigo
colorTo: blue
sdk: docker
sdk_version: 4.39.0
app_file: ui/app.py
pinned: false
AI Deep Researcher
AI Deep Researcher is a generative AI learning project built using the OpenAI Agentic Framework. This app performs deep-level web research based on user queries and generates a well-structured, consolidated report.
To achieve this, the project integrates the following technologies and AI features:
- OpenAI SDK
- OpenAI Agents
- OpenAI WebSearch Tool
- Serper API - a free alternative to OpenAI WebSearch Tool (https://serper.dev/api-keys)
- News API (https://newsapi.org/v2/everything)
- SendGrid (for emailing report)
- LLMs - (OpenAI, Geminia, Groq)
How it works?
The system is a multi-agent solution, where each agent has a specific responsibility:
Planner Agent
- Receives the user query and builds a structured query plan.
Guardrail Agent
- Validates user input and ensures compliance.
- Stops the workflow if the input contains inappropriate or unparliamentary words.
Search Agent
- Executes the query plan.
- Runs multiple web searches in parallel to gather data.
Writer Agent
- Reads results from all search agents.
- Generates a well-formatted, consolidated report.
Email Agent (not functional at present)
- Responsible for sending the report via email using SendGrid.
Orchestrator
- The entry point of the system.
- Facilitates communication and workflow between all agents.
Project Folder Structure
deep-research/
βββ ui/
β βββ app.py # Main Streamlit application entry point
β βββ __pycache__/ # Python bytecode cache
βββ appagents/
β βββ __init__.py # Package initialization
β βββ orchestrator.py # Orchestrator agent - coordinates all agents
β βββ planner_agent.py # Planner agent - builds structured query plans
β βββ guardrail_agent.py # Guardrail agent - validates user input
β βββ search_agent.py # Search agent - performs web searches
β βββ writer_agent.py # Writer agent - generates consolidated reports
β βββ email_agent.py # Email agent - sends reports via email (not functional)
β βββ __pycache__/ # Python bytecode cache
βββ core/
β βββ __init__.py # Package initialization
β βββ logger.py # Centralized logging configuration
β βββ __pycache__/ # Python bytecode cache
βββ tools/
β βββ __init__.py # Package initialization
β βββ google_tools.py # Google search utilities
β βββ time_tools.py # Time-related utility functions
β βββ __pycache__/ # Python bytecode cache
βββ prompts/
β βββ __init__.py # Package initialization (if present)
β βββ planner_prompt.txt # Prompt for planner agent (if present)
β βββ guardrail_prompt.txt # Prompt for guardrail agent (if present)
β βββ search_prompt.txt # Prompt for search agent (if present)
β βββ writer_prompt.txt # Prompt for writer agent (if present)
βββ Dockerfile # Docker configuration for container deployment
βββ pyproject.toml # Project metadata and dependencies (copied from root)
βββ uv.lock # Locked dependency versions (copied from root)
βββ README.md # Project documentation
βββ run.py # Script to run the application locally (if present)
File Descriptions
UI Layer (ui/)
- app.py - Main Streamlit web application that provides the user interface. Handles:
- Text input for research queries
- Run/Download buttons (PDF, Markdown)
- Real-time streaming of results
- Display of final research reports
- Session state management
- Button enable/disable during streaming
Agents (appagents/)
orchestrator.py - Central coordinator that:
- Manages the multi-agent workflow
- Handles communication between all agents
- Streams results back to the UI
- Implements the research pipeline
planner_agent.py - Creates a structured plan for the query:
- Breaks down user query into actionable research steps
- Defines search queries and research angles
guardrail_agent.py - Validates user input:
- Checks for inappropriate content
- Ensures compliance with policies
- Stops workflow if violations detected
search_agent.py - Executes web searches:
- Performs parallel web searches
- Integrates with Google Search / Serper API
- Gathers raw research data
writer_agent.py - Generates final report:
- Consolidates search results
- Formats findings into structured markdown
- Creates well-organized research summaries
email_agent.py - Email delivery (not functional):
- Intended to send reports via SendGrid
- Currently not integrated in the workflow
Core Utilities (core/)
- logger.py - Centralized logging configuration:
- Provides consistent logging across agents
- Handles log levels and formatting
Tools (tools/)
google_tools.py - Google/Serper API wrapper:
- Executes web searches
- Handles API authentication and response parsing
time_tools.py - Utility functions:
- Time-related operations
- Timestamp management
Configuration Files
Dockerfile - Container deployment:
- Builds Docker image with Python 3.12
- Installs dependencies using
uv - Sets up Streamlit server on port 7860
- Configures PYTHONPATH for module imports
pyproject.toml - Project metadata:
- Package name: "agents"
- Python version requirement: 3.12
- Lists all dependencies (OpenAI, LangChain, Streamlit, etc.)
uv.lock - Dependency lock file:
- Ensures reproducible builds
- Pins exact versions of all dependencies
Key Technologies
| Component | Technology | Purpose |
|---|---|---|
| LLM Framework | OpenAI Agents | Multi-agent orchestration |
| Web Search | Serper API / Google Search | Research data gathering |
| Web UI | Streamlit | User interface and interaction |
| Document Export | ReportLab | PDF generation from markdown |
| Async Operations | AsyncIO | Parallel agent execution |
| Dependencies | UV | Fast Python package management |
| Containerization | Docker | Cloud deployment |
Running Locally
# Install dependencies
uv sync
# Set environment variables defined in .env.name file
export OPENAI_API_KEY="your-key"
export SERPER_API_KEY="your-key"
# Run the Streamlit app
python run.py
Deployment
The project is deployed on Hugging Face Spaces as a Docker container:
- Space: https://huggingface.co/spaces/mishrabp/deep-research
- URL: https://huggingface.co/spaces/mishrabp/deep-research
- Trigger: Automatic deployment on push to
mainbranch - Configuration:
.github/workflows/deep-research-app-hf.yml