Spaces:

mishrabp
/

deep-research

Running

App Files Files Community

deep-research / README.md

mishrabp

Upload folder using huggingface_hub

e9c7617 verified 3 days ago

preview code

raw

history blame contribute delete

7.52 kB

	---
	title: AI Deep Researcher # Give your app a title
	emoji: 🤖 # Pick an emoji
	colorFrom: indigo # Theme start color
	colorTo: blue # Theme end color
	sdk: docker # SDK type
	sdk_version: "4.39.0" # Example Gradio version
	app_file: ui/app.py # <-- points to your app.py inside ui/
	pinned: false
	---

	# AI Deep Researcher

	AI Deep Researcher is a generative AI learning project built using the OpenAI Agentic Framework. This app performs deep-level web research based on user queries and generates a well-structured, consolidated report.

	To achieve this, the project integrates the following technologies and AI features:
	- OpenAI SDK
	- OpenAI Agents
	- OpenAI WebSearch Tool
	- Serper API - a free alternative to OpenAI WebSearch Tool (https://serper.dev/api-keys)
	- News API (https://newsapi.org/v2/everything)
	- SendGrid (for emailing report)
	- LLMs - (OpenAI, Geminia, Groq)

	## How it works?
	The system is a multi-agent solution, where each agent has a specific responsibility:

	1. Planner Agent
	- Receives the user query and builds a structured query plan.

	2. Guardrail Agent
	- Validates user input and ensures compliance.
	- Stops the workflow if the input contains inappropriate or unparliamentary words.

	3. Search Agent
	- Executes the query plan.
	- Runs multiple web searches in parallel to gather data.

	4. Writer Agent
	- Reads results from all search agents.
	- Generates a well-formatted, consolidated report.

	5. Email Agent (not functional at present)
	- Responsible for sending the report via email using SendGrid.

	6. Orchestrator
	- The entry point of the system.
	- Facilitates communication and workflow between all agents.

	## Project Folder Structure

	```
	deep-research/
	├── ui/
	│ ├── app.py # Main Streamlit application entry point
	│ └── __pycache__/ # Python bytecode cache
	├── appagents/
	│ ├── __init__.py # Package initialization
	│ ├── orchestrator.py # Orchestrator agent - coordinates all agents
	│ ├── planner_agent.py # Planner agent - builds structured query plans
	│ ├── guardrail_agent.py # Guardrail agent - validates user input
	│ ├── search_agent.py # Search agent - performs web searches
	│ ├── writer_agent.py # Writer agent - generates consolidated reports
	│ ├── email_agent.py # Email agent - sends reports via email (not functional)
	│ └── __pycache__/ # Python bytecode cache
	├── core/
	│ ├── __init__.py # Package initialization
	│ ├── logger.py # Centralized logging configuration
	│ └── __pycache__/ # Python bytecode cache
	├── tools/
	│ ├── __init__.py # Package initialization
	│ ├── google_tools.py # Google search utilities
	│ ├── time_tools.py # Time-related utility functions
	│ └── __pycache__/ # Python bytecode cache
	├── prompts/
	│ ├── __init__.py # Package initialization (if present)
	│ ├── planner_prompt.txt # Prompt for planner agent (if present)
	│ ├── guardrail_prompt.txt # Prompt for guardrail agent (if present)
	│ ├── search_prompt.txt # Prompt for search agent (if present)
	│ └── writer_prompt.txt # Prompt for writer agent (if present)
	├── Dockerfile # Docker configuration for container deployment
	├── pyproject.toml # Project metadata and dependencies (copied from root)
	├── uv.lock # Locked dependency versions (copied from root)
	├── README.md # Project documentation
	└── run.py # Script to run the application locally (if present)
	```

	## File Descriptions

	### UI Layer (`ui/`)
	- app.py - Main Streamlit web application that provides the user interface. Handles:
	- Text input for research queries
	- Run/Download buttons (PDF, Markdown)
	- Real-time streaming of results
	- Display of final research reports
	- Session state management
	- Button enable/disable during streaming

	### Agents (`appagents/`)
	- orchestrator.py - Central coordinator that:
	- Manages the multi-agent workflow
	- Handles communication between all agents
	- Streams results back to the UI
	- Implements the research pipeline

	- planner_agent.py - Creates a structured plan for the query:
	- Breaks down user query into actionable research steps
	- Defines search queries and research angles

	- guardrail_agent.py - Validates user input:
	- Checks for inappropriate content
	- Ensures compliance with policies
	- Stops workflow if violations detected

	- search_agent.py - Executes web searches:
	- Performs parallel web searches
	- Integrates with Google Search / Serper API
	- Gathers raw research data

	- writer_agent.py - Generates final report:
	- Consolidates search results
	- Formats findings into structured markdown
	- Creates well-organized research summaries

	- email_agent.py - Email delivery (not functional):
	- Intended to send reports via SendGrid
	- Currently not integrated in the workflow

	### Core Utilities (`core/`)
	- logger.py - Centralized logging configuration:
	- Provides consistent logging across agents
	- Handles log levels and formatting

	### Tools (`tools/`)
	- google_tools.py - Google/Serper API wrapper:
	- Executes web searches
	- Handles API authentication and response parsing

	- time_tools.py - Utility functions:
	- Time-related operations
	- Timestamp management

	### Configuration Files
	- Dockerfile - Container deployment:
	- Builds Docker image with Python 3.12
	- Installs dependencies using `uv`
	- Sets up Streamlit server on port 7860
	- Configures PYTHONPATH for module imports

	- pyproject.toml - Project metadata:
	- Package name: "agents"
	- Python version requirement: 3.12
	- Lists all dependencies (OpenAI, LangChain, Streamlit, etc.)

	- uv.lock - Dependency lock file:
	- Ensures reproducible builds
	- Pins exact versions of all dependencies

	## Key Technologies

	\| Component \| Technology \| Purpose \|
	\|-----------\|-----------\|---------\|
	\| LLM Framework \| OpenAI Agents \| Multi-agent orchestration \|
	\| Web Search \| Serper API / Google Search \| Research data gathering \|
	\| Web UI \| Streamlit \| User interface and interaction \|
	\| Document Export \| ReportLab \| PDF generation from markdown \|
	\| Async Operations \| AsyncIO \| Parallel agent execution \|
	\| Dependencies \| UV \| Fast Python package management \|
	\| Containerization \| Docker \| Cloud deployment \|

	## Running Locally

	```bash
	# Install dependencies
	uv sync

	# Set environment variables defined in .env.name file
	export OPENAI_API_KEY="your-key"
	export SERPER_API_KEY="your-key"

	# Run the Streamlit app
	python run.py
	```

	## Deployment

	The project is deployed on Hugging Face Spaces as a Docker container:
	- Space: https://huggingface.co/spaces/mishrabp/deep-research
	- URL: https://huggingface.co/spaces/mishrabp/deep-research
	- Trigger: Automatic deployment on push to `main` branch
	- Configuration: `.github/workflows/deep-research-app-hf.yml`