File size: 7,523 Bytes
6778a49
bfe9585
 
 
 
 
 
 
6778a49
 
 
bfe9585
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36173c8
bfe9585
 
 
 
 
 
e9c7617
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
title: AI Deep Researcher        # Give your app a title
emoji: πŸ€–                       # Pick an emoji
colorFrom: indigo               # Theme start color
colorTo: blue                   # Theme end color
sdk: docker                     # SDK type
sdk_version: "4.39.0"           # Example Gradio version
app_file: ui/app.py             # <-- points to your app.py inside ui/
pinned: false
---

# AI Deep Researcher

**AI Deep Researcher** is a generative AI learning project built using the OpenAI Agentic Framework. This app performs deep-level web research based on user queries and generates a well-structured, consolidated report.

To achieve this, the project integrates the following technologies and AI features:
- **OpenAI SDK**
- **OpenAI Agents**
- **OpenAI WebSearch Tool**
- **Serper API** - a free alternative to OpenAI WebSearch Tool (https://serper.dev/api-keys)
- **News API** (https://newsapi.org/v2/everything)
- **SendGrid** (for emailing report)
- **LLMs** - (OpenAI, Geminia, Groq)

## How it works?
The system is a multi-agent solution, where each agent has a specific responsibility:

1. **Planner Agent**
    - Receives the user query and builds a structured query plan.

2. **Guardrail Agent**
    - Validates user input and ensures compliance.
    - Stops the workflow if the input contains inappropriate or unparliamentary words.

3. **Search Agent**
    - Executes the query plan.
    - Runs multiple web searches in parallel to gather data.

4. **Writer Agent**
    - Reads results from all search agents.
    - Generates a well-formatted, consolidated report.

5. **Email Agent (not functional at present)**
    - Responsible for sending the report via email using SendGrid.

6. **Orchestrator**
    - The entry point of the system.
    - Facilitates communication and workflow between all agents.

## Project Folder Structure

```
deep-research/
β”œβ”€β”€ ui/
β”‚   β”œβ”€β”€ app.py                    # Main Streamlit application entry point
β”‚   └── __pycache__/              # Python bytecode cache
β”œβ”€β”€ appagents/
β”‚   β”œβ”€β”€ __init__.py               # Package initialization
β”‚   β”œβ”€β”€ orchestrator.py           # Orchestrator agent - coordinates all agents
β”‚   β”œβ”€β”€ planner_agent.py          # Planner agent - builds structured query plans
β”‚   β”œβ”€β”€ guardrail_agent.py        # Guardrail agent - validates user input
β”‚   β”œβ”€β”€ search_agent.py           # Search agent - performs web searches
β”‚   β”œβ”€β”€ writer_agent.py           # Writer agent - generates consolidated reports
β”‚   β”œβ”€β”€ email_agent.py            # Email agent - sends reports via email (not functional)
β”‚   └── __pycache__/              # Python bytecode cache
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ __init__.py               # Package initialization
β”‚   β”œβ”€β”€ logger.py                 # Centralized logging configuration
β”‚   └── __pycache__/              # Python bytecode cache
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ __init__.py               # Package initialization
β”‚   β”œβ”€β”€ google_tools.py           # Google search utilities
β”‚   β”œβ”€β”€ time_tools.py             # Time-related utility functions
β”‚   └── __pycache__/              # Python bytecode cache
β”œβ”€β”€ prompts/
β”‚   β”œβ”€β”€ __init__.py               # Package initialization (if present)
β”‚   β”œβ”€β”€ planner_prompt.txt        # Prompt for planner agent (if present)
β”‚   β”œβ”€β”€ guardrail_prompt.txt      # Prompt for guardrail agent (if present)
β”‚   β”œβ”€β”€ search_prompt.txt         # Prompt for search agent (if present)
β”‚   └── writer_prompt.txt         # Prompt for writer agent (if present)
β”œβ”€β”€ Dockerfile                     # Docker configuration for container deployment
β”œβ”€β”€ pyproject.toml                 # Project metadata and dependencies (copied from root)
β”œβ”€β”€ uv.lock                        # Locked dependency versions (copied from root)
β”œβ”€β”€ README.md                      # Project documentation
└── run.py                         # Script to run the application locally (if present)
```

## File Descriptions

### UI Layer (`ui/`)
- **app.py** - Main Streamlit web application that provides the user interface. Handles:
  - Text input for research queries
  - Run/Download buttons (PDF, Markdown)
  - Real-time streaming of results
  - Display of final research reports
  - Session state management
  - Button enable/disable during streaming

### Agents (`appagents/`)
- **orchestrator.py** - Central coordinator that:
  - Manages the multi-agent workflow
  - Handles communication between all agents
  - Streams results back to the UI
  - Implements the research pipeline

- **planner_agent.py** - Creates a structured plan for the query:
  - Breaks down user query into actionable research steps
  - Defines search queries and research angles

- **guardrail_agent.py** - Validates user input:
  - Checks for inappropriate content
  - Ensures compliance with policies
  - Stops workflow if violations detected

- **search_agent.py** - Executes web searches:
  - Performs parallel web searches
  - Integrates with Google Search / Serper API
  - Gathers raw research data

- **writer_agent.py** - Generates final report:
  - Consolidates search results
  - Formats findings into structured markdown
  - Creates well-organized research summaries

- **email_agent.py** - Email delivery (not functional):
  - Intended to send reports via SendGrid
  - Currently not integrated in the workflow

### Core Utilities (`core/`)
- **logger.py** - Centralized logging configuration:
  - Provides consistent logging across agents
  - Handles log levels and formatting

### Tools (`tools/`)
- **google_tools.py** - Google/Serper API wrapper:
  - Executes web searches
  - Handles API authentication and response parsing

- **time_tools.py** - Utility functions:
  - Time-related operations
  - Timestamp management

### Configuration Files
- **Dockerfile** - Container deployment:
  - Builds Docker image with Python 3.12
  - Installs dependencies using `uv`
  - Sets up Streamlit server on port 7860
  - Configures PYTHONPATH for module imports

- **pyproject.toml** - Project metadata:
  - Package name: "agents"
  - Python version requirement: 3.12
  - Lists all dependencies (OpenAI, LangChain, Streamlit, etc.)

- **uv.lock** - Dependency lock file:
  - Ensures reproducible builds
  - Pins exact versions of all dependencies

## Key Technologies

| Component | Technology | Purpose |
|-----------|-----------|---------|
| LLM Framework | OpenAI Agents | Multi-agent orchestration |
| Web Search | Serper API / Google Search | Research data gathering |
| Web UI | Streamlit | User interface and interaction |
| Document Export | ReportLab | PDF generation from markdown |
| Async Operations | AsyncIO | Parallel agent execution |
| Dependencies | UV | Fast Python package management |
| Containerization | Docker | Cloud deployment |

## Running Locally

```bash
# Install dependencies
uv sync

# Set environment variables defined in .env.name file
export OPENAI_API_KEY="your-key"
export SERPER_API_KEY="your-key"

# Run the Streamlit app
python run.py
```

## Deployment

The project is deployed on Hugging Face Spaces as a Docker container:
- **Space**: https://huggingface.co/spaces/mishrabp/deep-research
- **URL**: https://huggingface.co/spaces/mishrabp/deep-research
- **Trigger**: Automatic deployment on push to `main` branch
- **Configuration**: `.github/workflows/deep-research-app-hf.yml`