Spaces:
Running
Running
File size: 7,523 Bytes
6778a49 bfe9585 6778a49 bfe9585 36173c8 bfe9585 e9c7617 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
---
title: AI Deep Researcher # Give your app a title
emoji: π€ # Pick an emoji
colorFrom: indigo # Theme start color
colorTo: blue # Theme end color
sdk: docker # SDK type
sdk_version: "4.39.0" # Example Gradio version
app_file: ui/app.py # <-- points to your app.py inside ui/
pinned: false
---
# AI Deep Researcher
**AI Deep Researcher** is a generative AI learning project built using the OpenAI Agentic Framework. This app performs deep-level web research based on user queries and generates a well-structured, consolidated report.
To achieve this, the project integrates the following technologies and AI features:
- **OpenAI SDK**
- **OpenAI Agents**
- **OpenAI WebSearch Tool**
- **Serper API** - a free alternative to OpenAI WebSearch Tool (https://serper.dev/api-keys)
- **News API** (https://newsapi.org/v2/everything)
- **SendGrid** (for emailing report)
- **LLMs** - (OpenAI, Geminia, Groq)
## How it works?
The system is a multi-agent solution, where each agent has a specific responsibility:
1. **Planner Agent**
- Receives the user query and builds a structured query plan.
2. **Guardrail Agent**
- Validates user input and ensures compliance.
- Stops the workflow if the input contains inappropriate or unparliamentary words.
3. **Search Agent**
- Executes the query plan.
- Runs multiple web searches in parallel to gather data.
4. **Writer Agent**
- Reads results from all search agents.
- Generates a well-formatted, consolidated report.
5. **Email Agent (not functional at present)**
- Responsible for sending the report via email using SendGrid.
6. **Orchestrator**
- The entry point of the system.
- Facilitates communication and workflow between all agents.
## Project Folder Structure
```
deep-research/
βββ ui/
β βββ app.py # Main Streamlit application entry point
β βββ __pycache__/ # Python bytecode cache
βββ appagents/
β βββ __init__.py # Package initialization
β βββ orchestrator.py # Orchestrator agent - coordinates all agents
β βββ planner_agent.py # Planner agent - builds structured query plans
β βββ guardrail_agent.py # Guardrail agent - validates user input
β βββ search_agent.py # Search agent - performs web searches
β βββ writer_agent.py # Writer agent - generates consolidated reports
β βββ email_agent.py # Email agent - sends reports via email (not functional)
β βββ __pycache__/ # Python bytecode cache
βββ core/
β βββ __init__.py # Package initialization
β βββ logger.py # Centralized logging configuration
β βββ __pycache__/ # Python bytecode cache
βββ tools/
β βββ __init__.py # Package initialization
β βββ google_tools.py # Google search utilities
β βββ time_tools.py # Time-related utility functions
β βββ __pycache__/ # Python bytecode cache
βββ prompts/
β βββ __init__.py # Package initialization (if present)
β βββ planner_prompt.txt # Prompt for planner agent (if present)
β βββ guardrail_prompt.txt # Prompt for guardrail agent (if present)
β βββ search_prompt.txt # Prompt for search agent (if present)
β βββ writer_prompt.txt # Prompt for writer agent (if present)
βββ Dockerfile # Docker configuration for container deployment
βββ pyproject.toml # Project metadata and dependencies (copied from root)
βββ uv.lock # Locked dependency versions (copied from root)
βββ README.md # Project documentation
βββ run.py # Script to run the application locally (if present)
```
## File Descriptions
### UI Layer (`ui/`)
- **app.py** - Main Streamlit web application that provides the user interface. Handles:
- Text input for research queries
- Run/Download buttons (PDF, Markdown)
- Real-time streaming of results
- Display of final research reports
- Session state management
- Button enable/disable during streaming
### Agents (`appagents/`)
- **orchestrator.py** - Central coordinator that:
- Manages the multi-agent workflow
- Handles communication between all agents
- Streams results back to the UI
- Implements the research pipeline
- **planner_agent.py** - Creates a structured plan for the query:
- Breaks down user query into actionable research steps
- Defines search queries and research angles
- **guardrail_agent.py** - Validates user input:
- Checks for inappropriate content
- Ensures compliance with policies
- Stops workflow if violations detected
- **search_agent.py** - Executes web searches:
- Performs parallel web searches
- Integrates with Google Search / Serper API
- Gathers raw research data
- **writer_agent.py** - Generates final report:
- Consolidates search results
- Formats findings into structured markdown
- Creates well-organized research summaries
- **email_agent.py** - Email delivery (not functional):
- Intended to send reports via SendGrid
- Currently not integrated in the workflow
### Core Utilities (`core/`)
- **logger.py** - Centralized logging configuration:
- Provides consistent logging across agents
- Handles log levels and formatting
### Tools (`tools/`)
- **google_tools.py** - Google/Serper API wrapper:
- Executes web searches
- Handles API authentication and response parsing
- **time_tools.py** - Utility functions:
- Time-related operations
- Timestamp management
### Configuration Files
- **Dockerfile** - Container deployment:
- Builds Docker image with Python 3.12
- Installs dependencies using `uv`
- Sets up Streamlit server on port 7860
- Configures PYTHONPATH for module imports
- **pyproject.toml** - Project metadata:
- Package name: "agents"
- Python version requirement: 3.12
- Lists all dependencies (OpenAI, LangChain, Streamlit, etc.)
- **uv.lock** - Dependency lock file:
- Ensures reproducible builds
- Pins exact versions of all dependencies
## Key Technologies
| Component | Technology | Purpose |
|-----------|-----------|---------|
| LLM Framework | OpenAI Agents | Multi-agent orchestration |
| Web Search | Serper API / Google Search | Research data gathering |
| Web UI | Streamlit | User interface and interaction |
| Document Export | ReportLab | PDF generation from markdown |
| Async Operations | AsyncIO | Parallel agent execution |
| Dependencies | UV | Fast Python package management |
| Containerization | Docker | Cloud deployment |
## Running Locally
```bash
# Install dependencies
uv sync
# Set environment variables defined in .env.name file
export OPENAI_API_KEY="your-key"
export SERPER_API_KEY="your-key"
# Run the Streamlit app
python run.py
```
## Deployment
The project is deployed on Hugging Face Spaces as a Docker container:
- **Space**: https://huggingface.co/spaces/mishrabp/deep-research
- **URL**: https://huggingface.co/spaces/mishrabp/deep-research
- **Trigger**: Automatic deployment on push to `main` branch
- **Configuration**: `.github/workflows/deep-research-app-hf.yml`
|