Spaces:
Running
Running
| title: AI Deep Researcher # Give your app a title | |
| emoji: π€ # Pick an emoji | |
| colorFrom: indigo # Theme start color | |
| colorTo: blue # Theme end color | |
| sdk: docker # SDK type | |
| sdk_version: "4.39.0" # Example Gradio version | |
| app_file: ui/app.py # <-- points to your app.py inside ui/ | |
| pinned: false | |
| # AI Deep Researcher | |
| **AI Deep Researcher** is a generative AI learning project built using the OpenAI Agentic Framework. This app performs deep-level web research based on user queries and generates a well-structured, consolidated report. | |
| To achieve this, the project integrates the following technologies and AI features: | |
| - **OpenAI SDK** | |
| - **OpenAI Agents** | |
| - **OpenAI WebSearch Tool** | |
| - **Serper API** - a free alternative to OpenAI WebSearch Tool (https://serper.dev/api-keys) | |
| - **News API** (https://newsapi.org/v2/everything) | |
| - **SendGrid** (for emailing report) | |
| - **LLMs** - (OpenAI, Geminia, Groq) | |
| ## How it works? | |
| The system is a multi-agent solution, where each agent has a specific responsibility: | |
| 1. **Planner Agent** | |
| - Receives the user query and builds a structured query plan. | |
| 2. **Guardrail Agent** | |
| - Validates user input and ensures compliance. | |
| - Stops the workflow if the input contains inappropriate or unparliamentary words. | |
| 3. **Search Agent** | |
| - Executes the query plan. | |
| - Runs multiple web searches in parallel to gather data. | |
| 4. **Writer Agent** | |
| - Reads results from all search agents. | |
| - Generates a well-formatted, consolidated report. | |
| 5. **Email Agent (not functional at present)** | |
| - Responsible for sending the report via email using SendGrid. | |
| 6. **Orchestrator** | |
| - The entry point of the system. | |
| - Facilitates communication and workflow between all agents. | |
| ## Project Folder Structure | |
| ``` | |
| deep-research/ | |
| βββ ui/ | |
| β βββ app.py # Main Streamlit application entry point | |
| β βββ __pycache__/ # Python bytecode cache | |
| βββ appagents/ | |
| β βββ __init__.py # Package initialization | |
| β βββ orchestrator.py # Orchestrator agent - coordinates all agents | |
| β βββ planner_agent.py # Planner agent - builds structured query plans | |
| β βββ guardrail_agent.py # Guardrail agent - validates user input | |
| β βββ search_agent.py # Search agent - performs web searches | |
| β βββ writer_agent.py # Writer agent - generates consolidated reports | |
| β βββ email_agent.py # Email agent - sends reports via email (not functional) | |
| β βββ __pycache__/ # Python bytecode cache | |
| βββ core/ | |
| β βββ __init__.py # Package initialization | |
| β βββ logger.py # Centralized logging configuration | |
| β βββ __pycache__/ # Python bytecode cache | |
| βββ tools/ | |
| β βββ __init__.py # Package initialization | |
| β βββ google_tools.py # Google search utilities | |
| β βββ time_tools.py # Time-related utility functions | |
| β βββ __pycache__/ # Python bytecode cache | |
| βββ prompts/ | |
| β βββ __init__.py # Package initialization (if present) | |
| β βββ planner_prompt.txt # Prompt for planner agent (if present) | |
| β βββ guardrail_prompt.txt # Prompt for guardrail agent (if present) | |
| β βββ search_prompt.txt # Prompt for search agent (if present) | |
| β βββ writer_prompt.txt # Prompt for writer agent (if present) | |
| βββ Dockerfile # Docker configuration for container deployment | |
| βββ pyproject.toml # Project metadata and dependencies (copied from root) | |
| βββ uv.lock # Locked dependency versions (copied from root) | |
| βββ README.md # Project documentation | |
| βββ run.py # Script to run the application locally (if present) | |
| ``` | |
| ## File Descriptions | |
| ### UI Layer (`ui/`) | |
| - **app.py** - Main Streamlit web application that provides the user interface. Handles: | |
| - Text input for research queries | |
| - Run/Download buttons (PDF, Markdown) | |
| - Real-time streaming of results | |
| - Display of final research reports | |
| - Session state management | |
| - Button enable/disable during streaming | |
| ### Agents (`appagents/`) | |
| - **orchestrator.py** - Central coordinator that: | |
| - Manages the multi-agent workflow | |
| - Handles communication between all agents | |
| - Streams results back to the UI | |
| - Implements the research pipeline | |
| - **planner_agent.py** - Creates a structured plan for the query: | |
| - Breaks down user query into actionable research steps | |
| - Defines search queries and research angles | |
| - **guardrail_agent.py** - Validates user input: | |
| - Checks for inappropriate content | |
| - Ensures compliance with policies | |
| - Stops workflow if violations detected | |
| - **search_agent.py** - Executes web searches: | |
| - Performs parallel web searches | |
| - Integrates with Google Search / Serper API | |
| - Gathers raw research data | |
| - **writer_agent.py** - Generates final report: | |
| - Consolidates search results | |
| - Formats findings into structured markdown | |
| - Creates well-organized research summaries | |
| - **email_agent.py** - Email delivery (not functional): | |
| - Intended to send reports via SendGrid | |
| - Currently not integrated in the workflow | |
| ### Core Utilities (`core/`) | |
| - **logger.py** - Centralized logging configuration: | |
| - Provides consistent logging across agents | |
| - Handles log levels and formatting | |
| ### Tools (`tools/`) | |
| - **google_tools.py** - Google/Serper API wrapper: | |
| - Executes web searches | |
| - Handles API authentication and response parsing | |
| - **time_tools.py** - Utility functions: | |
| - Time-related operations | |
| - Timestamp management | |
| ### Configuration Files | |
| - **Dockerfile** - Container deployment: | |
| - Builds Docker image with Python 3.12 | |
| - Installs dependencies using `uv` | |
| - Sets up Streamlit server on port 7860 | |
| - Configures PYTHONPATH for module imports | |
| - **pyproject.toml** - Project metadata: | |
| - Package name: "agents" | |
| - Python version requirement: 3.12 | |
| - Lists all dependencies (OpenAI, LangChain, Streamlit, etc.) | |
| - **uv.lock** - Dependency lock file: | |
| - Ensures reproducible builds | |
| - Pins exact versions of all dependencies | |
| ## Key Technologies | |
| | Component | Technology | Purpose | | |
| |-----------|-----------|---------| | |
| | LLM Framework | OpenAI Agents | Multi-agent orchestration | | |
| | Web Search | Serper API / Google Search | Research data gathering | | |
| | Web UI | Streamlit | User interface and interaction | | |
| | Document Export | ReportLab | PDF generation from markdown | | |
| | Async Operations | AsyncIO | Parallel agent execution | | |
| | Dependencies | UV | Fast Python package management | | |
| | Containerization | Docker | Cloud deployment | | |
| ## Running Locally | |
| ```bash | |
| # Install dependencies | |
| uv sync | |
| # Set environment variables defined in .env.name file | |
| export OPENAI_API_KEY="your-key" | |
| export SERPER_API_KEY="your-key" | |
| # Run the Streamlit app | |
| python run.py | |
| ``` | |
| ## Deployment | |
| The project is deployed on Hugging Face Spaces as a Docker container: | |
| - **Space**: https://huggingface.co/spaces/mishrabp/deep-research | |
| - **URL**: https://huggingface.co/spaces/mishrabp/deep-research | |
| - **Trigger**: Automatic deployment on push to `main` branch | |
| - **Configuration**: `.github/workflows/deep-research-app-hf.yml` | |