deep-research / README.md
mishrabp's picture
Upload folder using huggingface_hub
e9c7617 verified
---
title: AI Deep Researcher # Give your app a title
emoji: πŸ€– # Pick an emoji
colorFrom: indigo # Theme start color
colorTo: blue # Theme end color
sdk: docker # SDK type
sdk_version: "4.39.0" # Example Gradio version
app_file: ui/app.py # <-- points to your app.py inside ui/
pinned: false
---
# AI Deep Researcher
**AI Deep Researcher** is a generative AI learning project built using the OpenAI Agentic Framework. This app performs deep-level web research based on user queries and generates a well-structured, consolidated report.
To achieve this, the project integrates the following technologies and AI features:
- **OpenAI SDK**
- **OpenAI Agents**
- **OpenAI WebSearch Tool**
- **Serper API** - a free alternative to OpenAI WebSearch Tool (https://serper.dev/api-keys)
- **News API** (https://newsapi.org/v2/everything)
- **SendGrid** (for emailing report)
- **LLMs** - (OpenAI, Geminia, Groq)
## How it works?
The system is a multi-agent solution, where each agent has a specific responsibility:
1. **Planner Agent**
- Receives the user query and builds a structured query plan.
2. **Guardrail Agent**
- Validates user input and ensures compliance.
- Stops the workflow if the input contains inappropriate or unparliamentary words.
3. **Search Agent**
- Executes the query plan.
- Runs multiple web searches in parallel to gather data.
4. **Writer Agent**
- Reads results from all search agents.
- Generates a well-formatted, consolidated report.
5. **Email Agent (not functional at present)**
- Responsible for sending the report via email using SendGrid.
6. **Orchestrator**
- The entry point of the system.
- Facilitates communication and workflow between all agents.
## Project Folder Structure
```
deep-research/
β”œβ”€β”€ ui/
β”‚ β”œβ”€β”€ app.py # Main Streamlit application entry point
β”‚ └── __pycache__/ # Python bytecode cache
β”œβ”€β”€ appagents/
β”‚ β”œβ”€β”€ __init__.py # Package initialization
β”‚ β”œβ”€β”€ orchestrator.py # Orchestrator agent - coordinates all agents
β”‚ β”œβ”€β”€ planner_agent.py # Planner agent - builds structured query plans
β”‚ β”œβ”€β”€ guardrail_agent.py # Guardrail agent - validates user input
β”‚ β”œβ”€β”€ search_agent.py # Search agent - performs web searches
β”‚ β”œβ”€β”€ writer_agent.py # Writer agent - generates consolidated reports
β”‚ β”œβ”€β”€ email_agent.py # Email agent - sends reports via email (not functional)
β”‚ └── __pycache__/ # Python bytecode cache
β”œβ”€β”€ core/
β”‚ β”œβ”€β”€ __init__.py # Package initialization
β”‚ β”œβ”€β”€ logger.py # Centralized logging configuration
β”‚ └── __pycache__/ # Python bytecode cache
β”œβ”€β”€ tools/
β”‚ β”œβ”€β”€ __init__.py # Package initialization
β”‚ β”œβ”€β”€ google_tools.py # Google search utilities
β”‚ β”œβ”€β”€ time_tools.py # Time-related utility functions
β”‚ └── __pycache__/ # Python bytecode cache
β”œβ”€β”€ prompts/
β”‚ β”œβ”€β”€ __init__.py # Package initialization (if present)
β”‚ β”œβ”€β”€ planner_prompt.txt # Prompt for planner agent (if present)
β”‚ β”œβ”€β”€ guardrail_prompt.txt # Prompt for guardrail agent (if present)
β”‚ β”œβ”€β”€ search_prompt.txt # Prompt for search agent (if present)
β”‚ └── writer_prompt.txt # Prompt for writer agent (if present)
β”œβ”€β”€ Dockerfile # Docker configuration for container deployment
β”œβ”€β”€ pyproject.toml # Project metadata and dependencies (copied from root)
β”œβ”€β”€ uv.lock # Locked dependency versions (copied from root)
β”œβ”€β”€ README.md # Project documentation
└── run.py # Script to run the application locally (if present)
```
## File Descriptions
### UI Layer (`ui/`)
- **app.py** - Main Streamlit web application that provides the user interface. Handles:
- Text input for research queries
- Run/Download buttons (PDF, Markdown)
- Real-time streaming of results
- Display of final research reports
- Session state management
- Button enable/disable during streaming
### Agents (`appagents/`)
- **orchestrator.py** - Central coordinator that:
- Manages the multi-agent workflow
- Handles communication between all agents
- Streams results back to the UI
- Implements the research pipeline
- **planner_agent.py** - Creates a structured plan for the query:
- Breaks down user query into actionable research steps
- Defines search queries and research angles
- **guardrail_agent.py** - Validates user input:
- Checks for inappropriate content
- Ensures compliance with policies
- Stops workflow if violations detected
- **search_agent.py** - Executes web searches:
- Performs parallel web searches
- Integrates with Google Search / Serper API
- Gathers raw research data
- **writer_agent.py** - Generates final report:
- Consolidates search results
- Formats findings into structured markdown
- Creates well-organized research summaries
- **email_agent.py** - Email delivery (not functional):
- Intended to send reports via SendGrid
- Currently not integrated in the workflow
### Core Utilities (`core/`)
- **logger.py** - Centralized logging configuration:
- Provides consistent logging across agents
- Handles log levels and formatting
### Tools (`tools/`)
- **google_tools.py** - Google/Serper API wrapper:
- Executes web searches
- Handles API authentication and response parsing
- **time_tools.py** - Utility functions:
- Time-related operations
- Timestamp management
### Configuration Files
- **Dockerfile** - Container deployment:
- Builds Docker image with Python 3.12
- Installs dependencies using `uv`
- Sets up Streamlit server on port 7860
- Configures PYTHONPATH for module imports
- **pyproject.toml** - Project metadata:
- Package name: "agents"
- Python version requirement: 3.12
- Lists all dependencies (OpenAI, LangChain, Streamlit, etc.)
- **uv.lock** - Dependency lock file:
- Ensures reproducible builds
- Pins exact versions of all dependencies
## Key Technologies
| Component | Technology | Purpose |
|-----------|-----------|---------|
| LLM Framework | OpenAI Agents | Multi-agent orchestration |
| Web Search | Serper API / Google Search | Research data gathering |
| Web UI | Streamlit | User interface and interaction |
| Document Export | ReportLab | PDF generation from markdown |
| Async Operations | AsyncIO | Parallel agent execution |
| Dependencies | UV | Fast Python package management |
| Containerization | Docker | Cloud deployment |
## Running Locally
```bash
# Install dependencies
uv sync
# Set environment variables defined in .env.name file
export OPENAI_API_KEY="your-key"
export SERPER_API_KEY="your-key"
# Run the Streamlit app
python run.py
```
## Deployment
The project is deployed on Hugging Face Spaces as a Docker container:
- **Space**: https://huggingface.co/spaces/mishrabp/deep-research
- **URL**: https://huggingface.co/spaces/mishrabp/deep-research
- **Trigger**: Automatic deployment on push to `main` branch
- **Configuration**: `.github/workflows/deep-research-app-hf.yml`