Spaces:

mishrabp
/

deep-research

Running

App Files Files Community

deep-research / README.md

mishrabp

Upload folder using huggingface_hub

e9c7617 verified 2 days ago

preview code

raw

history blame contribute delete

7.52 kB

metadata

title: AI Deep Researcher
emoji: 🤖
colorFrom: indigo
colorTo: blue
sdk: docker
sdk_version: 4.39.0
app_file: ui/app.py
pinned: false

AI Deep Researcher

AI Deep Researcher is a generative AI learning project built using the OpenAI Agentic Framework. This app performs deep-level web research based on user queries and generates a well-structured, consolidated report.

To achieve this, the project integrates the following technologies and AI features:

OpenAI SDK
OpenAI Agents
OpenAI WebSearch Tool
Serper API - a free alternative to OpenAI WebSearch Tool (https://serper.dev/api-keys)
News API (https://newsapi.org/v2/everything)
SendGrid (for emailing report)
LLMs - (OpenAI, Geminia, Groq)

How it works?

The system is a multi-agent solution, where each agent has a specific responsibility:

Planner Agent
- Receives the user query and builds a structured query plan.
Guardrail Agent
- Validates user input and ensures compliance.
- Stops the workflow if the input contains inappropriate or unparliamentary words.
Search Agent
- Executes the query plan.
- Runs multiple web searches in parallel to gather data.
Writer Agent
- Reads results from all search agents.
- Generates a well-formatted, consolidated report.
Email Agent (not functional at present)
- Responsible for sending the report via email using SendGrid.
Orchestrator
- The entry point of the system.
- Facilitates communication and workflow between all agents.

Project Folder Structure

deep-research/
├── ui/
│   ├── app.py                    # Main Streamlit application entry point
│   └── __pycache__/              # Python bytecode cache
├── appagents/
│   ├── __init__.py               # Package initialization
│   ├── orchestrator.py           # Orchestrator agent - coordinates all agents
│   ├── planner_agent.py          # Planner agent - builds structured query plans
│   ├── guardrail_agent.py        # Guardrail agent - validates user input
│   ├── search_agent.py           # Search agent - performs web searches
│   ├── writer_agent.py           # Writer agent - generates consolidated reports
│   ├── email_agent.py            # Email agent - sends reports via email (not functional)
│   └── __pycache__/              # Python bytecode cache
├── core/
│   ├── __init__.py               # Package initialization
│   ├── logger.py                 # Centralized logging configuration
│   └── __pycache__/              # Python bytecode cache
├── tools/
│   ├── __init__.py               # Package initialization
│   ├── google_tools.py           # Google search utilities
│   ├── time_tools.py             # Time-related utility functions
│   └── __pycache__/              # Python bytecode cache
├── prompts/
│   ├── __init__.py               # Package initialization (if present)
│   ├── planner_prompt.txt        # Prompt for planner agent (if present)
│   ├── guardrail_prompt.txt      # Prompt for guardrail agent (if present)
│   ├── search_prompt.txt         # Prompt for search agent (if present)
│   └── writer_prompt.txt         # Prompt for writer agent (if present)
├── Dockerfile                     # Docker configuration for container deployment
├── pyproject.toml                 # Project metadata and dependencies (copied from root)
├── uv.lock                        # Locked dependency versions (copied from root)
├── README.md                      # Project documentation
└── run.py                         # Script to run the application locally (if present)

File Descriptions

UI Layer (`ui/`)

app.py - Main Streamlit web application that provides the user interface. Handles:
- Text input for research queries
- Run/Download buttons (PDF, Markdown)
- Real-time streaming of results
- Display of final research reports
- Session state management
- Button enable/disable during streaming

Agents (`appagents/`)

orchestrator.py - Central coordinator that:
- Manages the multi-agent workflow
- Handles communication between all agents
- Streams results back to the UI
- Implements the research pipeline
planner_agent.py - Creates a structured plan for the query:
- Breaks down user query into actionable research steps
- Defines search queries and research angles
guardrail_agent.py - Validates user input:
- Checks for inappropriate content
- Ensures compliance with policies
- Stops workflow if violations detected
search_agent.py - Executes web searches:
- Performs parallel web searches
- Integrates with Google Search / Serper API
- Gathers raw research data
writer_agent.py - Generates final report:
- Consolidates search results
- Formats findings into structured markdown
- Creates well-organized research summaries
email_agent.py - Email delivery (not functional):
- Intended to send reports via SendGrid
- Currently not integrated in the workflow

Core Utilities (`core/`)

logger.py - Centralized logging configuration:
- Provides consistent logging across agents
- Handles log levels and formatting

Tools (`tools/`)

google_tools.py - Google/Serper API wrapper:
- Executes web searches
- Handles API authentication and response parsing
time_tools.py - Utility functions:
- Time-related operations
- Timestamp management

Configuration Files

Dockerfile - Container deployment:
- Builds Docker image with Python 3.12
- Installs dependencies using uv
- Sets up Streamlit server on port 7860
- Configures PYTHONPATH for module imports
pyproject.toml - Project metadata:
- Package name: "agents"
- Python version requirement: 3.12
- Lists all dependencies (OpenAI, LangChain, Streamlit, etc.)
uv.lock - Dependency lock file:
- Ensures reproducible builds
- Pins exact versions of all dependencies

Key Technologies

Component	Technology	Purpose
LLM Framework	OpenAI Agents	Multi-agent orchestration
Web Search	Serper API / Google Search	Research data gathering
Web UI	Streamlit	User interface and interaction
Document Export	ReportLab	PDF generation from markdown
Async Operations	AsyncIO	Parallel agent execution
Dependencies	UV	Fast Python package management
Containerization	Docker	Cloud deployment

Running Locally

# Install dependencies
uv sync

# Set environment variables defined in .env.name file
export OPENAI_API_KEY="your-key"
export SERPER_API_KEY="your-key"

# Run the Streamlit app
python run.py

Deployment

The project is deployed on Hugging Face Spaces as a Docker container:

Space: https://huggingface.co/spaces/mishrabp/deep-research
URL: https://huggingface.co/spaces/mishrabp/deep-research
Trigger: Automatic deployment on push to main branch
Configuration: .github/workflows/deep-research-app-hf.yml