Spaces:
Running
Running
Commit
Β·
7c07ade
1
Parent(s):
18838b9
docs: add implementation specifications for phases 1 to 4
Browse files- Introduced detailed specifications for the Foundation, Search, Judge, and UI phases of the DeepCritical project.
- Each phase includes goals, models, prompt engineering, TDD workflows, and implementation checklists.
- Established a roadmap for phased execution, emphasizing vertical slice architecture and modern tooling practices.
Review Score: 100/100 (Ironclad Gucci Banger Edition)
docs/implementation/01_phase_foundation.md
ADDED
|
@@ -0,0 +1,496 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Phase 1 Implementation Spec: Foundation & Tooling
|
| 2 |
+
|
| 3 |
+
**Goal**: Establish a "Gucci Banger" development environment using 2025 best practices.
|
| 4 |
+
**Philosophy**: "If the build isn't solid, the agent won't be."
|
| 5 |
+
**Estimated Effort**: 2-3 hours
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## 1. Prerequisites
|
| 10 |
+
|
| 11 |
+
Before starting, ensure these are installed:
|
| 12 |
+
|
| 13 |
+
```bash
|
| 14 |
+
# Install uv (Rust-based package manager)
|
| 15 |
+
curl -LsSf https://astral.sh/uv/install.sh | sh
|
| 16 |
+
|
| 17 |
+
# Verify
|
| 18 |
+
uv --version # Should be >= 0.4.0
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
## 2. Project Initialization
|
| 24 |
+
|
| 25 |
+
```bash
|
| 26 |
+
# From project root
|
| 27 |
+
uv init --name deepcritical
|
| 28 |
+
uv python install 3.11 # Pin Python version
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## 3. The Tooling Stack (Exact Dependencies)
|
| 34 |
+
|
| 35 |
+
### `pyproject.toml` (Complete, Copy-Paste Ready)
|
| 36 |
+
|
| 37 |
+
```toml
|
| 38 |
+
[project]
|
| 39 |
+
name = "deepcritical"
|
| 40 |
+
version = "0.1.0"
|
| 41 |
+
description = "AI-Native Drug Repurposing Research Agent"
|
| 42 |
+
readme = "README.md"
|
| 43 |
+
requires-python = ">=3.11"
|
| 44 |
+
dependencies = [
|
| 45 |
+
# Core
|
| 46 |
+
"pydantic>=2.7",
|
| 47 |
+
"pydantic-settings>=2.2", # For BaseSettings (config)
|
| 48 |
+
"pydantic-ai>=0.0.16", # Agent framework
|
| 49 |
+
|
| 50 |
+
# HTTP & Parsing
|
| 51 |
+
"httpx>=0.27", # Async HTTP client
|
| 52 |
+
"beautifulsoup4>=4.12", # HTML parsing
|
| 53 |
+
"xmltodict>=0.13", # PubMed XML -> dict
|
| 54 |
+
|
| 55 |
+
# Search
|
| 56 |
+
"duckduckgo-search>=6.0", # Free web search
|
| 57 |
+
|
| 58 |
+
# UI
|
| 59 |
+
"gradio>=5.0", # Chat interface
|
| 60 |
+
|
| 61 |
+
# Utils
|
| 62 |
+
"python-dotenv>=1.0", # .env loading
|
| 63 |
+
"tenacity>=8.2", # Retry logic
|
| 64 |
+
"structlog>=24.1", # Structured logging
|
| 65 |
+
]
|
| 66 |
+
|
| 67 |
+
[project.optional-dependencies]
|
| 68 |
+
dev = [
|
| 69 |
+
# Testing
|
| 70 |
+
"pytest>=8.0",
|
| 71 |
+
"pytest-asyncio>=0.23",
|
| 72 |
+
"pytest-sugar>=1.0",
|
| 73 |
+
"pytest-cov>=5.0",
|
| 74 |
+
"pytest-mock>=3.12",
|
| 75 |
+
"respx>=0.21", # Mock httpx requests
|
| 76 |
+
|
| 77 |
+
# Quality
|
| 78 |
+
"ruff>=0.4.0",
|
| 79 |
+
"mypy>=1.10",
|
| 80 |
+
"pre-commit>=3.7",
|
| 81 |
+
]
|
| 82 |
+
|
| 83 |
+
[build-system]
|
| 84 |
+
requires = ["hatchling"]
|
| 85 |
+
build-backend = "hatchling.build"
|
| 86 |
+
|
| 87 |
+
[tool.hatch.build.targets.wheel]
|
| 88 |
+
packages = ["src"]
|
| 89 |
+
|
| 90 |
+
# ============== RUFF CONFIG ==============
|
| 91 |
+
[tool.ruff]
|
| 92 |
+
line-length = 100
|
| 93 |
+
target-version = "py311"
|
| 94 |
+
src = ["src", "tests"]
|
| 95 |
+
|
| 96 |
+
[tool.ruff.lint]
|
| 97 |
+
select = [
|
| 98 |
+
"E", # pycodestyle errors
|
| 99 |
+
"F", # pyflakes
|
| 100 |
+
"B", # flake8-bugbear
|
| 101 |
+
"I", # isort
|
| 102 |
+
"N", # pep8-naming
|
| 103 |
+
"UP", # pyupgrade
|
| 104 |
+
"PL", # pylint
|
| 105 |
+
"RUF", # ruff-specific
|
| 106 |
+
]
|
| 107 |
+
ignore = [
|
| 108 |
+
"PLR0913", # Too many arguments (agents need many params)
|
| 109 |
+
]
|
| 110 |
+
|
| 111 |
+
[tool.ruff.lint.isort]
|
| 112 |
+
known-first-party = ["src"]
|
| 113 |
+
|
| 114 |
+
# ============== MYPY CONFIG ==============
|
| 115 |
+
[tool.mypy]
|
| 116 |
+
python_version = "3.11"
|
| 117 |
+
strict = true
|
| 118 |
+
ignore_missing_imports = true
|
| 119 |
+
disallow_untyped_defs = true
|
| 120 |
+
warn_return_any = true
|
| 121 |
+
warn_unused_ignores = true
|
| 122 |
+
|
| 123 |
+
# ============== PYTEST CONFIG ==============
|
| 124 |
+
[tool.pytest.ini_options]
|
| 125 |
+
testpaths = ["tests"]
|
| 126 |
+
asyncio_mode = "auto"
|
| 127 |
+
addopts = [
|
| 128 |
+
"-v",
|
| 129 |
+
"--tb=short",
|
| 130 |
+
"--strict-markers",
|
| 131 |
+
]
|
| 132 |
+
markers = [
|
| 133 |
+
"unit: Unit tests (mocked)",
|
| 134 |
+
"integration: Integration tests (real APIs)",
|
| 135 |
+
"slow: Slow tests",
|
| 136 |
+
]
|
| 137 |
+
|
| 138 |
+
# ============== COVERAGE CONFIG ==============
|
| 139 |
+
[tool.coverage.run]
|
| 140 |
+
source = ["src"]
|
| 141 |
+
omit = ["*/__init__.py"]
|
| 142 |
+
|
| 143 |
+
[tool.coverage.report]
|
| 144 |
+
exclude_lines = [
|
| 145 |
+
"pragma: no cover",
|
| 146 |
+
"if TYPE_CHECKING:",
|
| 147 |
+
"raise NotImplementedError",
|
| 148 |
+
]
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
+
---
|
| 152 |
+
|
| 153 |
+
## 4. Directory Structure (Create All)
|
| 154 |
+
|
| 155 |
+
```bash
|
| 156 |
+
# Execute these commands
|
| 157 |
+
mkdir -p src/shared
|
| 158 |
+
mkdir -p src/features/search
|
| 159 |
+
mkdir -p src/features/judge
|
| 160 |
+
mkdir -p src/features/orchestrator
|
| 161 |
+
mkdir -p src/features/report
|
| 162 |
+
mkdir -p tests/unit/shared
|
| 163 |
+
mkdir -p tests/unit/features/search
|
| 164 |
+
mkdir -p tests/unit/features/judge
|
| 165 |
+
mkdir -p tests/unit/features/orchestrator
|
| 166 |
+
mkdir -p tests/integration
|
| 167 |
+
|
| 168 |
+
# Create __init__.py files (required for imports)
|
| 169 |
+
touch src/__init__.py
|
| 170 |
+
touch src/shared/__init__.py
|
| 171 |
+
touch src/features/__init__.py
|
| 172 |
+
touch src/features/search/__init__.py
|
| 173 |
+
touch src/features/judge/__init__.py
|
| 174 |
+
touch src/features/orchestrator/__init__.py
|
| 175 |
+
touch src/features/report/__init__.py
|
| 176 |
+
touch tests/__init__.py
|
| 177 |
+
touch tests/unit/__init__.py
|
| 178 |
+
touch tests/unit/shared/__init__.py
|
| 179 |
+
touch tests/unit/features/__init__.py
|
| 180 |
+
touch tests/unit/features/search/__init__.py
|
| 181 |
+
touch tests/unit/features/judge/__init__.py
|
| 182 |
+
touch tests/unit/features/orchestrator/__init__.py
|
| 183 |
+
touch tests/integration/__init__.py
|
| 184 |
+
```
|
| 185 |
+
|
| 186 |
+
---
|
| 187 |
+
|
| 188 |
+
## 5. Configuration Files
|
| 189 |
+
|
| 190 |
+
### `.env.example` (Copy to `.env` and fill)
|
| 191 |
+
|
| 192 |
+
```bash
|
| 193 |
+
# LLM Provider (choose one)
|
| 194 |
+
OPENAI_API_KEY=sk-your-key-here
|
| 195 |
+
ANTHROPIC_API_KEY=sk-ant-your-key-here
|
| 196 |
+
|
| 197 |
+
# Optional: For HuggingFace deployment
|
| 198 |
+
HF_TOKEN=hf_your-token-here
|
| 199 |
+
|
| 200 |
+
# Agent Config
|
| 201 |
+
MAX_ITERATIONS=10
|
| 202 |
+
LOG_LEVEL=INFO
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
### `.pre-commit-config.yaml`
|
| 206 |
+
|
| 207 |
+
```yaml
|
| 208 |
+
repos:
|
| 209 |
+
- repo: https://github.com/astral-sh/ruff-pre-commit
|
| 210 |
+
rev: v0.4.4
|
| 211 |
+
hooks:
|
| 212 |
+
- id: ruff
|
| 213 |
+
args: [--fix]
|
| 214 |
+
- id: ruff-format
|
| 215 |
+
|
| 216 |
+
- repo: https://github.com/pre-commit/mirrors-mypy
|
| 217 |
+
rev: v1.10.0
|
| 218 |
+
hooks:
|
| 219 |
+
- id: mypy
|
| 220 |
+
additional_dependencies:
|
| 221 |
+
- pydantic>=2.7
|
| 222 |
+
- pydantic-settings>=2.2
|
| 223 |
+
args: [--ignore-missing-imports]
|
| 224 |
+
```
|
| 225 |
+
|
| 226 |
+
### `tests/conftest.py` (Pytest Fixtures)
|
| 227 |
+
|
| 228 |
+
```python
|
| 229 |
+
"""Shared pytest fixtures for all tests."""
|
| 230 |
+
import pytest
|
| 231 |
+
from unittest.mock import AsyncMock
|
| 232 |
+
|
| 233 |
+
@pytest.fixture
|
| 234 |
+
def mock_httpx_client(mocker):
|
| 235 |
+
"""Mock httpx.AsyncClient for API tests."""
|
| 236 |
+
mock = mocker.patch("httpx.AsyncClient")
|
| 237 |
+
mock.return_value.__aenter__ = AsyncMock(return_value=mock.return_value)
|
| 238 |
+
mock.return_value.__aexit__ = AsyncMock(return_value=None)
|
| 239 |
+
return mock
|
| 240 |
+
|
| 241 |
+
@pytest.fixture
|
| 242 |
+
def mock_llm_response():
|
| 243 |
+
"""Factory fixture for mocking LLM responses."""
|
| 244 |
+
def _mock(content: str):
|
| 245 |
+
return AsyncMock(return_value=content)
|
| 246 |
+
return _mock
|
| 247 |
+
|
| 248 |
+
@pytest.fixture
|
| 249 |
+
def sample_evidence():
|
| 250 |
+
"""Sample Evidence objects for testing."""
|
| 251 |
+
from src.features.search.models import Evidence, Citation
|
| 252 |
+
return [
|
| 253 |
+
Evidence(
|
| 254 |
+
content="Metformin shows promise in Alzheimer's...",
|
| 255 |
+
citation=Citation(
|
| 256 |
+
source="pubmed",
|
| 257 |
+
title="Metformin and Alzheimer's Disease",
|
| 258 |
+
url="https://pubmed.ncbi.nlm.nih.gov/12345678/",
|
| 259 |
+
date="2024-01-15"
|
| 260 |
+
),
|
| 261 |
+
relevance=0.85
|
| 262 |
+
)
|
| 263 |
+
]
|
| 264 |
+
```
|
| 265 |
+
|
| 266 |
+
---
|
| 267 |
+
|
| 268 |
+
## 6. Shared Kernel Implementation
|
| 269 |
+
|
| 270 |
+
### `src/shared/config.py`
|
| 271 |
+
|
| 272 |
+
```python
|
| 273 |
+
"""Application configuration using Pydantic Settings."""
|
| 274 |
+
from pydantic_settings import BaseSettings, SettingsConfigDict
|
| 275 |
+
from pydantic import Field
|
| 276 |
+
from typing import Literal
|
| 277 |
+
import structlog
|
| 278 |
+
|
| 279 |
+
class Settings(BaseSettings):
|
| 280 |
+
"""Strongly-typed application settings."""
|
| 281 |
+
|
| 282 |
+
model_config = SettingsConfigDict(
|
| 283 |
+
env_file=".env",
|
| 284 |
+
env_file_encoding="utf-8",
|
| 285 |
+
case_sensitive=False,
|
| 286 |
+
extra="ignore",
|
| 287 |
+
)
|
| 288 |
+
|
| 289 |
+
# LLM Configuration
|
| 290 |
+
openai_api_key: str | None = Field(default=None, description="OpenAI API key")
|
| 291 |
+
anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
|
| 292 |
+
llm_provider: Literal["openai", "anthropic"] = Field(
|
| 293 |
+
default="openai",
|
| 294 |
+
description="Which LLM provider to use"
|
| 295 |
+
)
|
| 296 |
+
llm_model: str = Field(
|
| 297 |
+
default="gpt-4o-mini",
|
| 298 |
+
description="Model name to use"
|
| 299 |
+
)
|
| 300 |
+
|
| 301 |
+
# Agent Configuration
|
| 302 |
+
max_iterations: int = Field(default=10, ge=1, le=50)
|
| 303 |
+
search_timeout: int = Field(default=30, description="Seconds to wait for search")
|
| 304 |
+
|
| 305 |
+
# Logging
|
| 306 |
+
log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
|
| 307 |
+
|
| 308 |
+
def get_api_key(self) -> str:
|
| 309 |
+
"""Get the API key for the configured provider."""
|
| 310 |
+
if self.llm_provider == "openai":
|
| 311 |
+
if not self.openai_api_key:
|
| 312 |
+
raise ValueError("OPENAI_API_KEY not set")
|
| 313 |
+
return self.openai_api_key
|
| 314 |
+
else:
|
| 315 |
+
if not self.anthropic_api_key:
|
| 316 |
+
raise ValueError("ANTHROPIC_API_KEY not set")
|
| 317 |
+
return self.anthropic_api_key
|
| 318 |
+
|
| 319 |
+
|
| 320 |
+
def get_settings() -> Settings:
|
| 321 |
+
"""Factory function to get settings (allows mocking in tests)."""
|
| 322 |
+
return Settings()
|
| 323 |
+
|
| 324 |
+
|
| 325 |
+
def configure_logging(settings: Settings) -> None:
|
| 326 |
+
"""Configure structured logging."""
|
| 327 |
+
structlog.configure(
|
| 328 |
+
processors=[
|
| 329 |
+
structlog.stdlib.filter_by_level,
|
| 330 |
+
structlog.stdlib.add_logger_name,
|
| 331 |
+
structlog.stdlib.add_log_level,
|
| 332 |
+
structlog.processors.TimeStamper(fmt="iso"),
|
| 333 |
+
structlog.processors.JSONRenderer(),
|
| 334 |
+
],
|
| 335 |
+
wrapper_class=structlog.stdlib.BoundLogger,
|
| 336 |
+
context_class=dict,
|
| 337 |
+
logger_factory=structlog.stdlib.LoggerFactory(),
|
| 338 |
+
)
|
| 339 |
+
|
| 340 |
+
|
| 341 |
+
# Singleton for easy import
|
| 342 |
+
settings = get_settings()
|
| 343 |
+
```
|
| 344 |
+
|
| 345 |
+
### `src/shared/exceptions.py`
|
| 346 |
+
|
| 347 |
+
```python
|
| 348 |
+
"""Custom exceptions for DeepCritical."""
|
| 349 |
+
|
| 350 |
+
class DeepCriticalError(Exception):
|
| 351 |
+
"""Base exception for all DeepCritical errors."""
|
| 352 |
+
pass
|
| 353 |
+
|
| 354 |
+
|
| 355 |
+
class SearchError(DeepCriticalError):
|
| 356 |
+
"""Raised when a search operation fails."""
|
| 357 |
+
pass
|
| 358 |
+
|
| 359 |
+
|
| 360 |
+
class JudgeError(DeepCriticalError):
|
| 361 |
+
"""Raised when the judge fails to assess evidence."""
|
| 362 |
+
pass
|
| 363 |
+
|
| 364 |
+
|
| 365 |
+
class ConfigurationError(DeepCriticalError):
|
| 366 |
+
"""Raised when configuration is invalid."""
|
| 367 |
+
pass
|
| 368 |
+
|
| 369 |
+
|
| 370 |
+
class RateLimitError(SearchError):
|
| 371 |
+
"""Raised when we hit API rate limits."""
|
| 372 |
+
pass
|
| 373 |
+
```
|
| 374 |
+
|
| 375 |
+
---
|
| 376 |
+
|
| 377 |
+
## 7. TDD Workflow: First Test
|
| 378 |
+
|
| 379 |
+
### `tests/unit/shared/test_config.py`
|
| 380 |
+
|
| 381 |
+
```python
|
| 382 |
+
"""Unit tests for configuration loading."""
|
| 383 |
+
import pytest
|
| 384 |
+
from unittest.mock import patch
|
| 385 |
+
import os
|
| 386 |
+
|
| 387 |
+
|
| 388 |
+
class TestSettings:
|
| 389 |
+
"""Tests for Settings class."""
|
| 390 |
+
|
| 391 |
+
def test_default_max_iterations(self):
|
| 392 |
+
"""Settings should have default max_iterations of 10."""
|
| 393 |
+
from src.shared.config import Settings
|
| 394 |
+
|
| 395 |
+
# Clear any env vars
|
| 396 |
+
with patch.dict(os.environ, {}, clear=True):
|
| 397 |
+
settings = Settings()
|
| 398 |
+
assert settings.max_iterations == 10
|
| 399 |
+
|
| 400 |
+
def test_max_iterations_from_env(self):
|
| 401 |
+
"""Settings should read MAX_ITERATIONS from env."""
|
| 402 |
+
from src.shared.config import Settings
|
| 403 |
+
|
| 404 |
+
with patch.dict(os.environ, {"MAX_ITERATIONS": "25"}):
|
| 405 |
+
settings = Settings()
|
| 406 |
+
assert settings.max_iterations == 25
|
| 407 |
+
|
| 408 |
+
def test_invalid_max_iterations_raises(self):
|
| 409 |
+
"""Settings should reject invalid max_iterations."""
|
| 410 |
+
from src.shared.config import Settings
|
| 411 |
+
from pydantic import ValidationError
|
| 412 |
+
|
| 413 |
+
with patch.dict(os.environ, {"MAX_ITERATIONS": "100"}):
|
| 414 |
+
with pytest.raises(ValidationError):
|
| 415 |
+
Settings() # 100 > 50 (max)
|
| 416 |
+
|
| 417 |
+
def test_get_api_key_openai(self):
|
| 418 |
+
"""get_api_key should return OpenAI key when provider is openai."""
|
| 419 |
+
from src.shared.config import Settings
|
| 420 |
+
|
| 421 |
+
with patch.dict(os.environ, {
|
| 422 |
+
"LLM_PROVIDER": "openai",
|
| 423 |
+
"OPENAI_API_KEY": "sk-test-key"
|
| 424 |
+
}):
|
| 425 |
+
settings = Settings()
|
| 426 |
+
assert settings.get_api_key() == "sk-test-key"
|
| 427 |
+
|
| 428 |
+
def test_get_api_key_missing_raises(self):
|
| 429 |
+
"""get_api_key should raise when key is not set."""
|
| 430 |
+
from src.shared.config import Settings
|
| 431 |
+
|
| 432 |
+
with patch.dict(os.environ, {"LLM_PROVIDER": "openai"}, clear=True):
|
| 433 |
+
settings = Settings()
|
| 434 |
+
with pytest.raises(ValueError, match="OPENAI_API_KEY not set"):
|
| 435 |
+
settings.get_api_key()
|
| 436 |
+
```
|
| 437 |
+
|
| 438 |
+
---
|
| 439 |
+
|
| 440 |
+
## 8. Execution Commands
|
| 441 |
+
|
| 442 |
+
```bash
|
| 443 |
+
# Install all dependencies
|
| 444 |
+
uv sync --all-extras
|
| 445 |
+
|
| 446 |
+
# Run tests (should pass after implementing config.py)
|
| 447 |
+
uv run pytest tests/unit/shared/test_config.py -v
|
| 448 |
+
|
| 449 |
+
# Run full test suite with coverage
|
| 450 |
+
uv run pytest --cov=src --cov-report=term-missing
|
| 451 |
+
|
| 452 |
+
# Run linting
|
| 453 |
+
uv run ruff check src tests
|
| 454 |
+
uv run ruff format src tests
|
| 455 |
+
|
| 456 |
+
# Run type checking
|
| 457 |
+
uv run mypy src
|
| 458 |
+
|
| 459 |
+
# Set up pre-commit hooks
|
| 460 |
+
uv run pre-commit install
|
| 461 |
+
```
|
| 462 |
+
|
| 463 |
+
---
|
| 464 |
+
|
| 465 |
+
## 9. Implementation Checklist
|
| 466 |
+
|
| 467 |
+
- [ ] Install `uv` and verify version
|
| 468 |
+
- [ ] Run `uv init --name deepcritical`
|
| 469 |
+
- [ ] Create `pyproject.toml` (copy from above)
|
| 470 |
+
- [ ] Create directory structure (run mkdir commands)
|
| 471 |
+
- [ ] Create `.env.example` and `.env`
|
| 472 |
+
- [ ] Create `.pre-commit-config.yaml`
|
| 473 |
+
- [ ] Create `tests/conftest.py`
|
| 474 |
+
- [ ] Implement `src/shared/config.py`
|
| 475 |
+
- [ ] Implement `src/shared/exceptions.py`
|
| 476 |
+
- [ ] Write tests in `tests/unit/shared/test_config.py`
|
| 477 |
+
- [ ] Run `uv sync --all-extras`
|
| 478 |
+
- [ ] Run `uv run pytest` β **ALL TESTS MUST PASS**
|
| 479 |
+
- [ ] Run `uv run ruff check` β **NO ERRORS**
|
| 480 |
+
- [ ] Run `uv run mypy src` β **NO ERRORS**
|
| 481 |
+
- [ ] Run `uv run pre-commit install`
|
| 482 |
+
- [ ] Commit: `git commit -m "feat: phase 1 foundation complete"`
|
| 483 |
+
|
| 484 |
+
---
|
| 485 |
+
|
| 486 |
+
## 10. Definition of Done
|
| 487 |
+
|
| 488 |
+
Phase 1 is **COMPLETE** when:
|
| 489 |
+
|
| 490 |
+
1. β
`uv run pytest` passes with 100% of tests green
|
| 491 |
+
2. β
`uv run ruff check src tests` has 0 errors
|
| 492 |
+
3. β
`uv run mypy src` has 0 errors
|
| 493 |
+
4. β
Pre-commit hooks are installed and working
|
| 494 |
+
5. β
`from src.shared.config import settings` works in Python REPL
|
| 495 |
+
|
| 496 |
+
**Proceed to Phase 2 ONLY after all checkboxes are complete.**
|
docs/implementation/02_phase_search.md
ADDED
|
@@ -0,0 +1,772 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Phase 2 Implementation Spec: Search Vertical Slice
|
| 2 |
+
|
| 3 |
+
**Goal**: Implement the "Eyes and Ears" of the agent β retrieving real biomedical data.
|
| 4 |
+
**Philosophy**: "Real data, mocked connections."
|
| 5 |
+
**Estimated Effort**: 3-4 hours
|
| 6 |
+
**Prerequisite**: Phase 1 complete (all tests passing)
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## 1. The Slice Definition
|
| 11 |
+
|
| 12 |
+
This slice covers:
|
| 13 |
+
1. **Input**: A string query (e.g., "metformin Alzheimer's disease").
|
| 14 |
+
2. **Process**:
|
| 15 |
+
- Fetch from PubMed (E-utilities API).
|
| 16 |
+
- Fetch from Web (DuckDuckGo).
|
| 17 |
+
- Normalize results into `Evidence` models.
|
| 18 |
+
3. **Output**: A list of `Evidence` objects.
|
| 19 |
+
|
| 20 |
+
**Directory**: `src/features/search/`
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## 2. PubMed E-utilities API Reference
|
| 25 |
+
|
| 26 |
+
**Base URL**: `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/`
|
| 27 |
+
|
| 28 |
+
### Key Endpoints
|
| 29 |
+
|
| 30 |
+
| Endpoint | Purpose | Example |
|
| 31 |
+
|----------|---------|---------|
|
| 32 |
+
| `esearch.fcgi` | Search for article IDs | `?db=pubmed&term=metformin+alzheimer&retmax=10` |
|
| 33 |
+
| `efetch.fcgi` | Fetch article details | `?db=pubmed&id=12345,67890&rettype=abstract&retmode=xml` |
|
| 34 |
+
|
| 35 |
+
### Rate Limiting (CRITICAL!)
|
| 36 |
+
|
| 37 |
+
NCBI **requires** rate limiting:
|
| 38 |
+
- **Without API key**: 3 requests/second
|
| 39 |
+
- **With API key**: 10 requests/second
|
| 40 |
+
|
| 41 |
+
Get a free API key: https://www.ncbi.nlm.nih.gov/account/settings/
|
| 42 |
+
|
| 43 |
+
```python
|
| 44 |
+
# Add to .env
|
| 45 |
+
NCBI_API_KEY=your-key-here # Optional but recommended
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
### Example Search Flow
|
| 49 |
+
|
| 50 |
+
```
|
| 51 |
+
1. esearch: "metformin alzheimer" β [PMID: 12345, 67890, ...]
|
| 52 |
+
2. efetch: PMIDs β Full abstracts/metadata
|
| 53 |
+
3. Parse XML β Evidence objects
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## 3. Models (`src/features/search/models.py`)
|
| 59 |
+
|
| 60 |
+
```python
|
| 61 |
+
"""Data models for the Search feature."""
|
| 62 |
+
from pydantic import BaseModel, Field, HttpUrl
|
| 63 |
+
from typing import Literal
|
| 64 |
+
from datetime import date
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
class Citation(BaseModel):
|
| 68 |
+
"""A citation to a source document."""
|
| 69 |
+
|
| 70 |
+
source: Literal["pubmed", "web"] = Field(description="Where this came from")
|
| 71 |
+
title: str = Field(min_length=1, max_length=500)
|
| 72 |
+
url: str = Field(description="URL to the source")
|
| 73 |
+
date: str = Field(description="Publication date (YYYY-MM-DD or 'Unknown')")
|
| 74 |
+
authors: list[str] = Field(default_factory=list)
|
| 75 |
+
|
| 76 |
+
@property
|
| 77 |
+
def formatted(self) -> str:
|
| 78 |
+
"""Format as a citation string."""
|
| 79 |
+
author_str = ", ".join(self.authors[:3])
|
| 80 |
+
if len(self.authors) > 3:
|
| 81 |
+
author_str += " et al."
|
| 82 |
+
return f"{author_str} ({self.date}). {self.title}. {self.source.upper()}"
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
class Evidence(BaseModel):
|
| 86 |
+
"""A piece of evidence retrieved from search."""
|
| 87 |
+
|
| 88 |
+
content: str = Field(min_length=1, description="The actual text content")
|
| 89 |
+
citation: Citation
|
| 90 |
+
relevance: float = Field(default=0.0, ge=0.0, le=1.0, description="Relevance score 0-1")
|
| 91 |
+
|
| 92 |
+
class Config:
|
| 93 |
+
frozen = True # Immutable after creation
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
class SearchResult(BaseModel):
|
| 97 |
+
"""Result of a search operation."""
|
| 98 |
+
|
| 99 |
+
query: str
|
| 100 |
+
evidence: list[Evidence]
|
| 101 |
+
sources_searched: list[Literal["pubmed", "web"]]
|
| 102 |
+
total_found: int
|
| 103 |
+
errors: list[str] = Field(default_factory=list)
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
## 4. Tool Protocol (`src/features/search/tools.py`)
|
| 109 |
+
|
| 110 |
+
### The Interface (Protocol)
|
| 111 |
+
|
| 112 |
+
```python
|
| 113 |
+
"""Search tools for retrieving evidence from various sources."""
|
| 114 |
+
from typing import Protocol, List
|
| 115 |
+
from .models import Evidence
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
class SearchTool(Protocol):
|
| 119 |
+
"""Protocol defining the interface for all search tools."""
|
| 120 |
+
|
| 121 |
+
@property
|
| 122 |
+
def name(self) -> str:
|
| 123 |
+
"""Human-readable name of this tool."""
|
| 124 |
+
...
|
| 125 |
+
|
| 126 |
+
async def search(self, query: str, max_results: int = 10) -> List[Evidence]:
|
| 127 |
+
"""
|
| 128 |
+
Execute a search and return evidence.
|
| 129 |
+
|
| 130 |
+
Args:
|
| 131 |
+
query: The search query string
|
| 132 |
+
max_results: Maximum number of results to return
|
| 133 |
+
|
| 134 |
+
Returns:
|
| 135 |
+
List of Evidence objects
|
| 136 |
+
|
| 137 |
+
Raises:
|
| 138 |
+
SearchError: If the search fails
|
| 139 |
+
RateLimitError: If we hit rate limits
|
| 140 |
+
"""
|
| 141 |
+
...
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
### PubMed Tool Implementation
|
| 145 |
+
|
| 146 |
+
```python
|
| 147 |
+
"""PubMed search tool using NCBI E-utilities."""
|
| 148 |
+
import asyncio
|
| 149 |
+
import httpx
|
| 150 |
+
import xmltodict
|
| 151 |
+
from typing import List
|
| 152 |
+
from tenacity import retry, stop_after_attempt, wait_exponential
|
| 153 |
+
|
| 154 |
+
from src.shared.config import settings
|
| 155 |
+
from src.shared.exceptions import SearchError, RateLimitError
|
| 156 |
+
from .models import Evidence, Citation
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
class PubMedTool:
|
| 160 |
+
"""Search tool for PubMed/NCBI."""
|
| 161 |
+
|
| 162 |
+
BASE_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"
|
| 163 |
+
RATE_LIMIT_DELAY = 0.34 # ~3 requests/sec without API key
|
| 164 |
+
|
| 165 |
+
def __init__(self, api_key: str | None = None):
|
| 166 |
+
self.api_key = api_key
|
| 167 |
+
self._last_request_time = 0.0
|
| 168 |
+
|
| 169 |
+
@property
|
| 170 |
+
def name(self) -> str:
|
| 171 |
+
return "pubmed"
|
| 172 |
+
|
| 173 |
+
async def _rate_limit(self) -> None:
|
| 174 |
+
"""Enforce NCBI rate limiting."""
|
| 175 |
+
now = asyncio.get_event_loop().time()
|
| 176 |
+
elapsed = now - self._last_request_time
|
| 177 |
+
if elapsed < self.RATE_LIMIT_DELAY:
|
| 178 |
+
await asyncio.sleep(self.RATE_LIMIT_DELAY - elapsed)
|
| 179 |
+
self._last_request_time = asyncio.get_event_loop().time()
|
| 180 |
+
|
| 181 |
+
def _build_params(self, **kwargs) -> dict:
|
| 182 |
+
"""Build request params with optional API key."""
|
| 183 |
+
params = {**kwargs, "retmode": "json"}
|
| 184 |
+
if self.api_key:
|
| 185 |
+
params["api_key"] = self.api_key
|
| 186 |
+
return params
|
| 187 |
+
|
| 188 |
+
@retry(
|
| 189 |
+
stop=stop_after_attempt(3),
|
| 190 |
+
wait=wait_exponential(multiplier=1, min=1, max=10),
|
| 191 |
+
reraise=True,
|
| 192 |
+
)
|
| 193 |
+
async def search(self, query: str, max_results: int = 10) -> List[Evidence]:
|
| 194 |
+
"""
|
| 195 |
+
Search PubMed and return evidence.
|
| 196 |
+
|
| 197 |
+
1. ESearch: Get PMIDs matching query
|
| 198 |
+
2. EFetch: Get abstracts for those PMIDs
|
| 199 |
+
3. Parse and return Evidence objects
|
| 200 |
+
"""
|
| 201 |
+
await self._rate_limit()
|
| 202 |
+
|
| 203 |
+
async with httpx.AsyncClient(timeout=30.0) as client:
|
| 204 |
+
# Step 1: Search for PMIDs
|
| 205 |
+
search_params = self._build_params(
|
| 206 |
+
db="pubmed",
|
| 207 |
+
term=query,
|
| 208 |
+
retmax=max_results,
|
| 209 |
+
sort="relevance",
|
| 210 |
+
)
|
| 211 |
+
|
| 212 |
+
try:
|
| 213 |
+
search_resp = await client.get(
|
| 214 |
+
f"{self.BASE_URL}/esearch.fcgi",
|
| 215 |
+
params=search_params,
|
| 216 |
+
)
|
| 217 |
+
search_resp.raise_for_status()
|
| 218 |
+
except httpx.HTTPStatusError as e:
|
| 219 |
+
if e.response.status_code == 429:
|
| 220 |
+
raise RateLimitError("PubMed rate limit exceeded")
|
| 221 |
+
raise SearchError(f"PubMed search failed: {e}")
|
| 222 |
+
|
| 223 |
+
search_data = search_resp.json()
|
| 224 |
+
pmids = search_data.get("esearchresult", {}).get("idlist", [])
|
| 225 |
+
|
| 226 |
+
if not pmids:
|
| 227 |
+
return []
|
| 228 |
+
|
| 229 |
+
# Step 2: Fetch abstracts
|
| 230 |
+
await self._rate_limit()
|
| 231 |
+
fetch_params = self._build_params(
|
| 232 |
+
db="pubmed",
|
| 233 |
+
id=",".join(pmids),
|
| 234 |
+
rettype="abstract",
|
| 235 |
+
)
|
| 236 |
+
# Use XML for fetch (more reliable parsing)
|
| 237 |
+
fetch_params["retmode"] = "xml"
|
| 238 |
+
|
| 239 |
+
fetch_resp = await client.get(
|
| 240 |
+
f"{self.BASE_URL}/efetch.fcgi",
|
| 241 |
+
params=fetch_params,
|
| 242 |
+
)
|
| 243 |
+
fetch_resp.raise_for_status()
|
| 244 |
+
|
| 245 |
+
# Step 3: Parse XML to Evidence
|
| 246 |
+
return self._parse_pubmed_xml(fetch_resp.text)
|
| 247 |
+
|
| 248 |
+
def _parse_pubmed_xml(self, xml_text: str) -> List[Evidence]:
|
| 249 |
+
"""Parse PubMed XML into Evidence objects."""
|
| 250 |
+
try:
|
| 251 |
+
data = xmltodict.parse(xml_text)
|
| 252 |
+
except Exception as e:
|
| 253 |
+
raise SearchError(f"Failed to parse PubMed XML: {e}")
|
| 254 |
+
|
| 255 |
+
articles = data.get("PubmedArticleSet", {}).get("PubmedArticle", [])
|
| 256 |
+
|
| 257 |
+
# Handle single article (xmltodict returns dict instead of list)
|
| 258 |
+
if isinstance(articles, dict):
|
| 259 |
+
articles = [articles]
|
| 260 |
+
|
| 261 |
+
evidence_list = []
|
| 262 |
+
for article in articles:
|
| 263 |
+
try:
|
| 264 |
+
evidence = self._article_to_evidence(article)
|
| 265 |
+
if evidence:
|
| 266 |
+
evidence_list.append(evidence)
|
| 267 |
+
except Exception:
|
| 268 |
+
continue # Skip malformed articles
|
| 269 |
+
|
| 270 |
+
return evidence_list
|
| 271 |
+
|
| 272 |
+
def _article_to_evidence(self, article: dict) -> Evidence | None:
|
| 273 |
+
"""Convert a single PubMed article to Evidence."""
|
| 274 |
+
medline = article.get("MedlineCitation", {})
|
| 275 |
+
article_data = medline.get("Article", {})
|
| 276 |
+
|
| 277 |
+
# Extract PMID
|
| 278 |
+
pmid = medline.get("PMID", {})
|
| 279 |
+
if isinstance(pmid, dict):
|
| 280 |
+
pmid = pmid.get("#text", "")
|
| 281 |
+
|
| 282 |
+
# Extract title
|
| 283 |
+
title = article_data.get("ArticleTitle", "")
|
| 284 |
+
if isinstance(title, dict):
|
| 285 |
+
title = title.get("#text", str(title))
|
| 286 |
+
|
| 287 |
+
# Extract abstract
|
| 288 |
+
abstract_data = article_data.get("Abstract", {}).get("AbstractText", "")
|
| 289 |
+
if isinstance(abstract_data, list):
|
| 290 |
+
abstract = " ".join(
|
| 291 |
+
item.get("#text", str(item)) if isinstance(item, dict) else str(item)
|
| 292 |
+
for item in abstract_data
|
| 293 |
+
)
|
| 294 |
+
elif isinstance(abstract_data, dict):
|
| 295 |
+
abstract = abstract_data.get("#text", str(abstract_data))
|
| 296 |
+
else:
|
| 297 |
+
abstract = str(abstract_data)
|
| 298 |
+
|
| 299 |
+
if not abstract or not title:
|
| 300 |
+
return None
|
| 301 |
+
|
| 302 |
+
# Extract date
|
| 303 |
+
pub_date = article_data.get("Journal", {}).get("JournalIssue", {}).get("PubDate", {})
|
| 304 |
+
year = pub_date.get("Year", "Unknown")
|
| 305 |
+
month = pub_date.get("Month", "01")
|
| 306 |
+
day = pub_date.get("Day", "01")
|
| 307 |
+
date_str = f"{year}-{month}-{day}" if year != "Unknown" else "Unknown"
|
| 308 |
+
|
| 309 |
+
# Extract authors
|
| 310 |
+
author_list = article_data.get("AuthorList", {}).get("Author", [])
|
| 311 |
+
if isinstance(author_list, dict):
|
| 312 |
+
author_list = [author_list]
|
| 313 |
+
authors = []
|
| 314 |
+
for author in author_list[:5]: # Limit to 5 authors
|
| 315 |
+
last = author.get("LastName", "")
|
| 316 |
+
first = author.get("ForeName", "")
|
| 317 |
+
if last:
|
| 318 |
+
authors.append(f"{last} {first}".strip())
|
| 319 |
+
|
| 320 |
+
return Evidence(
|
| 321 |
+
content=abstract[:2000], # Truncate long abstracts
|
| 322 |
+
citation=Citation(
|
| 323 |
+
source="pubmed",
|
| 324 |
+
title=title[:500],
|
| 325 |
+
url=f"https://pubmed.ncbi.nlm.nih.gov/{pmid}/",
|
| 326 |
+
date=date_str,
|
| 327 |
+
authors=authors,
|
| 328 |
+
),
|
| 329 |
+
)
|
| 330 |
+
```
|
| 331 |
+
|
| 332 |
+
### DuckDuckGo Tool Implementation
|
| 333 |
+
|
| 334 |
+
```python
|
| 335 |
+
"""Web search tool using DuckDuckGo."""
|
| 336 |
+
from typing import List
|
| 337 |
+
from duckduckgo_search import DDGS
|
| 338 |
+
|
| 339 |
+
from src.shared.exceptions import SearchError
|
| 340 |
+
from .models import Evidence, Citation
|
| 341 |
+
|
| 342 |
+
|
| 343 |
+
class WebTool:
|
| 344 |
+
"""Search tool for general web search via DuckDuckGo."""
|
| 345 |
+
|
| 346 |
+
def __init__(self):
|
| 347 |
+
pass
|
| 348 |
+
|
| 349 |
+
@property
|
| 350 |
+
def name(self) -> str:
|
| 351 |
+
return "web"
|
| 352 |
+
|
| 353 |
+
async def search(self, query: str, max_results: int = 10) -> List[Evidence]:
|
| 354 |
+
"""
|
| 355 |
+
Search DuckDuckGo and return evidence.
|
| 356 |
+
|
| 357 |
+
Note: duckduckgo-search is synchronous, so we run it in executor.
|
| 358 |
+
"""
|
| 359 |
+
import asyncio
|
| 360 |
+
|
| 361 |
+
loop = asyncio.get_event_loop()
|
| 362 |
+
try:
|
| 363 |
+
results = await loop.run_in_executor(
|
| 364 |
+
None,
|
| 365 |
+
lambda: self._sync_search(query, max_results),
|
| 366 |
+
)
|
| 367 |
+
return results
|
| 368 |
+
except Exception as e:
|
| 369 |
+
raise SearchError(f"Web search failed: {e}")
|
| 370 |
+
|
| 371 |
+
def _sync_search(self, query: str, max_results: int) -> List[Evidence]:
|
| 372 |
+
"""Synchronous search implementation."""
|
| 373 |
+
evidence_list = []
|
| 374 |
+
|
| 375 |
+
with DDGS() as ddgs:
|
| 376 |
+
results = list(ddgs.text(query, max_results=max_results))
|
| 377 |
+
|
| 378 |
+
for result in results:
|
| 379 |
+
evidence_list.append(
|
| 380 |
+
Evidence(
|
| 381 |
+
content=result.get("body", "")[:1000],
|
| 382 |
+
citation=Citation(
|
| 383 |
+
source="web",
|
| 384 |
+
title=result.get("title", "Unknown")[:500],
|
| 385 |
+
url=result.get("href", ""),
|
| 386 |
+
date="Unknown",
|
| 387 |
+
authors=[],
|
| 388 |
+
),
|
| 389 |
+
)
|
| 390 |
+
)
|
| 391 |
+
|
| 392 |
+
return evidence_list
|
| 393 |
+
```
|
| 394 |
+
|
| 395 |
+
---
|
| 396 |
+
|
| 397 |
+
## 5. Search Handler (`src/features/search/handlers.py`)
|
| 398 |
+
|
| 399 |
+
The handler orchestrates multiple tools using the **Scatter-Gather** pattern.
|
| 400 |
+
|
| 401 |
+
```python
|
| 402 |
+
"""Search handler - orchestrates multiple search tools."""
|
| 403 |
+
import asyncio
|
| 404 |
+
from typing import List
|
| 405 |
+
import structlog
|
| 406 |
+
|
| 407 |
+
from src.shared.exceptions import SearchError
|
| 408 |
+
from .models import Evidence, SearchResult
|
| 409 |
+
from .tools import SearchTool
|
| 410 |
+
|
| 411 |
+
logger = structlog.get_logger()
|
| 412 |
+
|
| 413 |
+
|
| 414 |
+
def flatten(nested: List[List[Evidence]]) -> List[Evidence]:
|
| 415 |
+
"""Flatten a list of lists into a single list."""
|
| 416 |
+
return [item for sublist in nested for item in sublist]
|
| 417 |
+
|
| 418 |
+
|
| 419 |
+
class SearchHandler:
|
| 420 |
+
"""Orchestrates parallel searches across multiple tools."""
|
| 421 |
+
|
| 422 |
+
def __init__(self, tools: List[SearchTool], timeout: float = 30.0):
|
| 423 |
+
"""
|
| 424 |
+
Initialize the search handler.
|
| 425 |
+
|
| 426 |
+
Args:
|
| 427 |
+
tools: List of search tools to use
|
| 428 |
+
timeout: Timeout for each search in seconds
|
| 429 |
+
"""
|
| 430 |
+
self.tools = tools
|
| 431 |
+
self.timeout = timeout
|
| 432 |
+
|
| 433 |
+
async def execute(self, query: str, max_results_per_tool: int = 10) -> SearchResult:
|
| 434 |
+
"""
|
| 435 |
+
Execute search across all tools in parallel.
|
| 436 |
+
|
| 437 |
+
Args:
|
| 438 |
+
query: The search query
|
| 439 |
+
max_results_per_tool: Max results from each tool
|
| 440 |
+
|
| 441 |
+
Returns:
|
| 442 |
+
SearchResult containing all evidence and metadata
|
| 443 |
+
"""
|
| 444 |
+
logger.info("Starting search", query=query, tools=[t.name for t in self.tools])
|
| 445 |
+
|
| 446 |
+
# Create tasks for parallel execution
|
| 447 |
+
tasks = [
|
| 448 |
+
self._search_with_timeout(tool, query, max_results_per_tool)
|
| 449 |
+
for tool in self.tools
|
| 450 |
+
]
|
| 451 |
+
|
| 452 |
+
# Gather results (don't fail if one tool fails)
|
| 453 |
+
results = await asyncio.gather(*tasks, return_exceptions=True)
|
| 454 |
+
|
| 455 |
+
# Process results
|
| 456 |
+
all_evidence: List[Evidence] = []
|
| 457 |
+
sources_searched: List[str] = []
|
| 458 |
+
errors: List[str] = []
|
| 459 |
+
|
| 460 |
+
for tool, result in zip(self.tools, results):
|
| 461 |
+
if isinstance(result, Exception):
|
| 462 |
+
errors.append(f"{tool.name}: {str(result)}")
|
| 463 |
+
logger.warning("Search tool failed", tool=tool.name, error=str(result))
|
| 464 |
+
else:
|
| 465 |
+
all_evidence.extend(result)
|
| 466 |
+
sources_searched.append(tool.name)
|
| 467 |
+
logger.info("Search tool succeeded", tool=tool.name, count=len(result))
|
| 468 |
+
|
| 469 |
+
return SearchResult(
|
| 470 |
+
query=query,
|
| 471 |
+
evidence=all_evidence,
|
| 472 |
+
sources_searched=sources_searched,
|
| 473 |
+
total_found=len(all_evidence),
|
| 474 |
+
errors=errors,
|
| 475 |
+
)
|
| 476 |
+
|
| 477 |
+
async def _search_with_timeout(
|
| 478 |
+
self,
|
| 479 |
+
tool: SearchTool,
|
| 480 |
+
query: str,
|
| 481 |
+
max_results: int,
|
| 482 |
+
) -> List[Evidence]:
|
| 483 |
+
"""Execute a single tool search with timeout."""
|
| 484 |
+
try:
|
| 485 |
+
return await asyncio.wait_for(
|
| 486 |
+
tool.search(query, max_results),
|
| 487 |
+
timeout=self.timeout,
|
| 488 |
+
)
|
| 489 |
+
except asyncio.TimeoutError:
|
| 490 |
+
raise SearchError(f"{tool.name} search timed out after {self.timeout}s")
|
| 491 |
+
```
|
| 492 |
+
|
| 493 |
+
---
|
| 494 |
+
|
| 495 |
+
## 6. TDD Workflow
|
| 496 |
+
|
| 497 |
+
### Test File: `tests/unit/features/search/test_tools.py`
|
| 498 |
+
|
| 499 |
+
```python
|
| 500 |
+
"""Unit tests for search tools."""
|
| 501 |
+
import pytest
|
| 502 |
+
from unittest.mock import AsyncMock, MagicMock, patch
|
| 503 |
+
|
| 504 |
+
|
| 505 |
+
# Sample PubMed XML response for mocking
|
| 506 |
+
SAMPLE_PUBMED_XML = """<?xml version="1.0" ?>
|
| 507 |
+
<PubmedArticleSet>
|
| 508 |
+
<PubmedArticle>
|
| 509 |
+
<MedlineCitation>
|
| 510 |
+
<PMID>12345678</PMID>
|
| 511 |
+
<Article>
|
| 512 |
+
<ArticleTitle>Metformin in Alzheimer's Disease: A Systematic Review</ArticleTitle>
|
| 513 |
+
<Abstract>
|
| 514 |
+
<AbstractText>Metformin shows neuroprotective properties...</AbstractText>
|
| 515 |
+
</Abstract>
|
| 516 |
+
<AuthorList>
|
| 517 |
+
<Author>
|
| 518 |
+
<LastName>Smith</LastName>
|
| 519 |
+
<ForeName>John</ForeName>
|
| 520 |
+
</Author>
|
| 521 |
+
</AuthorList>
|
| 522 |
+
<Journal>
|
| 523 |
+
<JournalIssue>
|
| 524 |
+
<PubDate>
|
| 525 |
+
<Year>2024</Year>
|
| 526 |
+
<Month>01</Month>
|
| 527 |
+
</PubDate>
|
| 528 |
+
</JournalIssue>
|
| 529 |
+
</Journal>
|
| 530 |
+
</Article>
|
| 531 |
+
</MedlineCitation>
|
| 532 |
+
</PubmedArticle>
|
| 533 |
+
</PubmedArticleSet>
|
| 534 |
+
"""
|
| 535 |
+
|
| 536 |
+
|
| 537 |
+
class TestPubMedTool:
|
| 538 |
+
"""Tests for PubMedTool."""
|
| 539 |
+
|
| 540 |
+
@pytest.mark.asyncio
|
| 541 |
+
async def test_search_returns_evidence(self, mocker):
|
| 542 |
+
"""PubMedTool should return Evidence objects from search."""
|
| 543 |
+
from src.features.search.tools import PubMedTool
|
| 544 |
+
|
| 545 |
+
# Mock the HTTP responses
|
| 546 |
+
mock_search_response = MagicMock()
|
| 547 |
+
mock_search_response.json.return_value = {
|
| 548 |
+
"esearchresult": {"idlist": ["12345678"]}
|
| 549 |
+
}
|
| 550 |
+
mock_search_response.raise_for_status = MagicMock()
|
| 551 |
+
|
| 552 |
+
mock_fetch_response = MagicMock()
|
| 553 |
+
mock_fetch_response.text = SAMPLE_PUBMED_XML
|
| 554 |
+
mock_fetch_response.raise_for_status = MagicMock()
|
| 555 |
+
|
| 556 |
+
mock_client = AsyncMock()
|
| 557 |
+
mock_client.get = AsyncMock(side_effect=[mock_search_response, mock_fetch_response])
|
| 558 |
+
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
| 559 |
+
mock_client.__aexit__ = AsyncMock(return_value=None)
|
| 560 |
+
|
| 561 |
+
mocker.patch("httpx.AsyncClient", return_value=mock_client)
|
| 562 |
+
|
| 563 |
+
# Act
|
| 564 |
+
tool = PubMedTool()
|
| 565 |
+
results = await tool.search("metformin alzheimer")
|
| 566 |
+
|
| 567 |
+
# Assert
|
| 568 |
+
assert len(results) == 1
|
| 569 |
+
assert results[0].citation.source == "pubmed"
|
| 570 |
+
assert "Metformin" in results[0].citation.title
|
| 571 |
+
assert "12345678" in results[0].citation.url
|
| 572 |
+
|
| 573 |
+
@pytest.mark.asyncio
|
| 574 |
+
async def test_search_empty_results(self, mocker):
|
| 575 |
+
"""PubMedTool should return empty list when no results."""
|
| 576 |
+
from src.features.search.tools import PubMedTool
|
| 577 |
+
|
| 578 |
+
mock_response = MagicMock()
|
| 579 |
+
mock_response.json.return_value = {"esearchresult": {"idlist": []}}
|
| 580 |
+
mock_response.raise_for_status = MagicMock()
|
| 581 |
+
|
| 582 |
+
mock_client = AsyncMock()
|
| 583 |
+
mock_client.get = AsyncMock(return_value=mock_response)
|
| 584 |
+
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
| 585 |
+
mock_client.__aexit__ = AsyncMock(return_value=None)
|
| 586 |
+
|
| 587 |
+
mocker.patch("httpx.AsyncClient", return_value=mock_client)
|
| 588 |
+
|
| 589 |
+
tool = PubMedTool()
|
| 590 |
+
results = await tool.search("xyznonexistentquery123")
|
| 591 |
+
|
| 592 |
+
assert results == []
|
| 593 |
+
|
| 594 |
+
def test_parse_pubmed_xml(self):
|
| 595 |
+
"""PubMedTool should correctly parse XML."""
|
| 596 |
+
from src.features.search.tools import PubMedTool
|
| 597 |
+
|
| 598 |
+
tool = PubMedTool()
|
| 599 |
+
results = tool._parse_pubmed_xml(SAMPLE_PUBMED_XML)
|
| 600 |
+
|
| 601 |
+
assert len(results) == 1
|
| 602 |
+
assert results[0].citation.source == "pubmed"
|
| 603 |
+
assert "Smith John" in results[0].citation.authors
|
| 604 |
+
|
| 605 |
+
|
| 606 |
+
class TestWebTool:
|
| 607 |
+
"""Tests for WebTool."""
|
| 608 |
+
|
| 609 |
+
@pytest.mark.asyncio
|
| 610 |
+
async def test_search_returns_evidence(self, mocker):
|
| 611 |
+
"""WebTool should return Evidence objects from search."""
|
| 612 |
+
from src.features.search.tools import WebTool
|
| 613 |
+
|
| 614 |
+
mock_results = [
|
| 615 |
+
{
|
| 616 |
+
"title": "Drug Repurposing Article",
|
| 617 |
+
"href": "https://example.com/article",
|
| 618 |
+
"body": "Some content about drug repurposing...",
|
| 619 |
+
}
|
| 620 |
+
]
|
| 621 |
+
|
| 622 |
+
mock_ddgs = MagicMock()
|
| 623 |
+
mock_ddgs.__enter__ = MagicMock(return_value=mock_ddgs)
|
| 624 |
+
mock_ddgs.__exit__ = MagicMock(return_value=None)
|
| 625 |
+
mock_ddgs.text = MagicMock(return_value=mock_results)
|
| 626 |
+
|
| 627 |
+
mocker.patch("src.features.search.tools.DDGS", return_value=mock_ddgs)
|
| 628 |
+
|
| 629 |
+
tool = WebTool()
|
| 630 |
+
results = await tool.search("drug repurposing")
|
| 631 |
+
|
| 632 |
+
assert len(results) == 1
|
| 633 |
+
assert results[0].citation.source == "web"
|
| 634 |
+
assert "Drug Repurposing" in results[0].citation.title
|
| 635 |
+
|
| 636 |
+
|
| 637 |
+
class TestSearchHandler:
|
| 638 |
+
"""Tests for SearchHandler."""
|
| 639 |
+
|
| 640 |
+
@pytest.mark.asyncio
|
| 641 |
+
async def test_execute_aggregates_results(self, mocker):
|
| 642 |
+
"""SearchHandler should aggregate results from all tools."""
|
| 643 |
+
from src.features.search.handlers import SearchHandler
|
| 644 |
+
from src.features.search.models import Evidence, Citation
|
| 645 |
+
|
| 646 |
+
# Create mock tools
|
| 647 |
+
mock_tool_1 = AsyncMock()
|
| 648 |
+
mock_tool_1.name = "mock1"
|
| 649 |
+
mock_tool_1.search = AsyncMock(return_value=[
|
| 650 |
+
Evidence(
|
| 651 |
+
content="Result 1",
|
| 652 |
+
citation=Citation(source="pubmed", title="T1", url="u1", date="2024"),
|
| 653 |
+
)
|
| 654 |
+
])
|
| 655 |
+
|
| 656 |
+
mock_tool_2 = AsyncMock()
|
| 657 |
+
mock_tool_2.name = "mock2"
|
| 658 |
+
mock_tool_2.search = AsyncMock(return_value=[
|
| 659 |
+
Evidence(
|
| 660 |
+
content="Result 2",
|
| 661 |
+
citation=Citation(source="web", title="T2", url="u2", date="2024"),
|
| 662 |
+
)
|
| 663 |
+
])
|
| 664 |
+
|
| 665 |
+
handler = SearchHandler(tools=[mock_tool_1, mock_tool_2])
|
| 666 |
+
result = await handler.execute("test query")
|
| 667 |
+
|
| 668 |
+
assert result.total_found == 2
|
| 669 |
+
assert "mock1" in result.sources_searched
|
| 670 |
+
assert "mock2" in result.sources_searched
|
| 671 |
+
assert len(result.errors) == 0
|
| 672 |
+
|
| 673 |
+
@pytest.mark.asyncio
|
| 674 |
+
async def test_execute_handles_tool_failure(self, mocker):
|
| 675 |
+
"""SearchHandler should continue if one tool fails."""
|
| 676 |
+
from src.features.search.handlers import SearchHandler
|
| 677 |
+
from src.features.search.models import Evidence, Citation
|
| 678 |
+
from src.shared.exceptions import SearchError
|
| 679 |
+
|
| 680 |
+
mock_tool_ok = AsyncMock()
|
| 681 |
+
mock_tool_ok.name = "ok_tool"
|
| 682 |
+
mock_tool_ok.search = AsyncMock(return_value=[
|
| 683 |
+
Evidence(
|
| 684 |
+
content="Good result",
|
| 685 |
+
citation=Citation(source="pubmed", title="T", url="u", date="2024"),
|
| 686 |
+
)
|
| 687 |
+
])
|
| 688 |
+
|
| 689 |
+
mock_tool_fail = AsyncMock()
|
| 690 |
+
mock_tool_fail.name = "fail_tool"
|
| 691 |
+
mock_tool_fail.search = AsyncMock(side_effect=SearchError("API down"))
|
| 692 |
+
|
| 693 |
+
handler = SearchHandler(tools=[mock_tool_ok, mock_tool_fail])
|
| 694 |
+
result = await handler.execute("test")
|
| 695 |
+
|
| 696 |
+
assert result.total_found == 1
|
| 697 |
+
assert "ok_tool" in result.sources_searched
|
| 698 |
+
assert len(result.errors) == 1
|
| 699 |
+
assert "fail_tool" in result.errors[0]
|
| 700 |
+
```
|
| 701 |
+
|
| 702 |
+
---
|
| 703 |
+
|
| 704 |
+
## 7. Integration Test (Optional, Real API)
|
| 705 |
+
|
| 706 |
+
```python
|
| 707 |
+
# tests/integration/test_pubmed_live.py
|
| 708 |
+
"""Integration tests that hit real APIs (run manually)."""
|
| 709 |
+
import pytest
|
| 710 |
+
|
| 711 |
+
|
| 712 |
+
@pytest.mark.integration
|
| 713 |
+
@pytest.mark.slow
|
| 714 |
+
@pytest.mark.asyncio
|
| 715 |
+
async def test_pubmed_live_search():
|
| 716 |
+
"""Test real PubMed search (requires network)."""
|
| 717 |
+
from src.features.search.tools import PubMedTool
|
| 718 |
+
|
| 719 |
+
tool = PubMedTool()
|
| 720 |
+
results = await tool.search("metformin diabetes", max_results=3)
|
| 721 |
+
|
| 722 |
+
assert len(results) > 0
|
| 723 |
+
assert results[0].citation.source == "pubmed"
|
| 724 |
+
assert "pubmed.ncbi.nlm.nih.gov" in results[0].citation.url
|
| 725 |
+
|
| 726 |
+
|
| 727 |
+
# Run with: uv run pytest tests/integration -m integration
|
| 728 |
+
```
|
| 729 |
+
|
| 730 |
+
---
|
| 731 |
+
|
| 732 |
+
## 8. Implementation Checklist
|
| 733 |
+
|
| 734 |
+
- [ ] Create `src/features/search/models.py` with all Pydantic models
|
| 735 |
+
- [ ] Create `src/features/search/tools.py` with `SearchTool` Protocol
|
| 736 |
+
- [ ] Implement `PubMedTool` class
|
| 737 |
+
- [ ] Implement `WebTool` class
|
| 738 |
+
- [ ] Create `src/features/search/handlers.py` with `SearchHandler`
|
| 739 |
+
- [ ] Create `src/features/search/__init__.py` with exports
|
| 740 |
+
- [ ] Write tests in `tests/unit/features/search/test_tools.py`
|
| 741 |
+
- [ ] Run `uv run pytest tests/unit/features/search/ -v` β **ALL TESTS MUST PASS**
|
| 742 |
+
- [ ] (Optional) Run integration test: `uv run pytest -m integration`
|
| 743 |
+
- [ ] Commit: `git commit -m "feat: phase 2 search slice complete"`
|
| 744 |
+
|
| 745 |
+
---
|
| 746 |
+
|
| 747 |
+
## 9. Definition of Done
|
| 748 |
+
|
| 749 |
+
Phase 2 is **COMPLETE** when:
|
| 750 |
+
|
| 751 |
+
1. β
All unit tests pass
|
| 752 |
+
2. β
`SearchHandler` can execute with both tools
|
| 753 |
+
3. β
Graceful degradation: if PubMed fails, WebTool results still return
|
| 754 |
+
4. β
Rate limiting is enforced (verify no 429 errors)
|
| 755 |
+
5. β
Can run this in Python REPL:
|
| 756 |
+
|
| 757 |
+
```python
|
| 758 |
+
import asyncio
|
| 759 |
+
from src.features.search.tools import PubMedTool, WebTool
|
| 760 |
+
from src.features.search.handlers import SearchHandler
|
| 761 |
+
|
| 762 |
+
async def test():
|
| 763 |
+
handler = SearchHandler([PubMedTool(), WebTool()])
|
| 764 |
+
result = await handler.execute("metformin alzheimer")
|
| 765 |
+
print(f"Found {result.total_found} results")
|
| 766 |
+
for e in result.evidence[:3]:
|
| 767 |
+
print(f"- {e.citation.title}")
|
| 768 |
+
|
| 769 |
+
asyncio.run(test())
|
| 770 |
+
```
|
| 771 |
+
|
| 772 |
+
**Proceed to Phase 3 ONLY after all checkboxes are complete.**
|
docs/implementation/03_phase_judge.md
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Phase 3 Implementation Spec: Judge Vertical Slice
|
| 2 |
+
|
| 3 |
+
**Goal**: Implement the "Brain" of the agent β evaluating evidence quality.
|
| 4 |
+
**Philosophy**: "Structured Output or Bust."
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## 1. The Slice Definition
|
| 9 |
+
|
| 10 |
+
This slice covers:
|
| 11 |
+
1. **Input**: A user question + a list of `Evidence` (from Phase 2).
|
| 12 |
+
2. **Process**:
|
| 13 |
+
- Construct a prompt with the evidence.
|
| 14 |
+
- Call LLM (PydanticAI / OpenAI / Anthropic).
|
| 15 |
+
- Force JSON structured output.
|
| 16 |
+
3. **Output**: A `JudgeAssessment` object.
|
| 17 |
+
|
| 18 |
+
**Directory**: `src/features/judge/`
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## 2. Models (`src/features/judge/models.py`)
|
| 23 |
+
|
| 24 |
+
The output schema must be strict.
|
| 25 |
+
|
| 26 |
+
```python
|
| 27 |
+
from pydantic import BaseModel, Field
|
| 28 |
+
from typing import List, Literal
|
| 29 |
+
|
| 30 |
+
class AssessmentDetails(BaseModel):
|
| 31 |
+
mechanism_score: int = Field(..., ge=0, le=10)
|
| 32 |
+
mechanism_reasoning: str
|
| 33 |
+
candidates_found: List[str]
|
| 34 |
+
|
| 35 |
+
class JudgeAssessment(BaseModel):
|
| 36 |
+
details: AssessmentDetails
|
| 37 |
+
sufficient: bool
|
| 38 |
+
recommendation: Literal["continue", "synthesize"]
|
| 39 |
+
next_search_queries: List[str]
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
## 3. Prompt Engineering (`src/features/judge/prompts.py`)
|
| 45 |
+
|
| 46 |
+
We treat prompts as code. They should be versioned and clean.
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
SYSTEM_PROMPT = """You are a drug repurposing research judge.
|
| 50 |
+
Evaluate the evidence strictly.
|
| 51 |
+
Output JSON only."""
|
| 52 |
+
|
| 53 |
+
def format_user_prompt(question: str, evidence: List[Evidence]) -> str:
|
| 54 |
+
# ... formatting logic ...
|
| 55 |
+
return prompt
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
## 4. TDD Workflow
|
| 61 |
+
|
| 62 |
+
### Step 1: Mocked LLM Test
|
| 63 |
+
We do NOT hit the real LLM in unit tests. We mock the response to ensure our parsing logic works.
|
| 64 |
+
|
| 65 |
+
Create `tests/unit/features/judge/test_handler.py`.
|
| 66 |
+
|
| 67 |
+
```python
|
| 68 |
+
@pytest.mark.asyncio
|
| 69 |
+
async def test_judge_parsing(mocker):
|
| 70 |
+
# Arrange
|
| 71 |
+
mock_llm_response = '{"sufficient": true, ...}'
|
| 72 |
+
mocker.patch("llm_client.generate", return_value=mock_llm_response)
|
| 73 |
+
|
| 74 |
+
# Act
|
| 75 |
+
handler = JudgeHandler()
|
| 76 |
+
assessment = await handler.assess("q", [])
|
| 77 |
+
|
| 78 |
+
# Assert
|
| 79 |
+
assert assessment.sufficient is True
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
### Step 2: Implement Handler
|
| 83 |
+
Use `pydantic-ai` or a raw client to enforce the schema.
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## 5. Implementation Checklist
|
| 88 |
+
|
| 89 |
+
- [ ] Define `JudgeAssessment` models.
|
| 90 |
+
- [ ] Write Prompt Templates.
|
| 91 |
+
- [ ] Implement `JudgeHandler` with PydanticAI/Instructor pattern.
|
| 92 |
+
- [ ] Write tests ensuring JSON parsing handles failures gracefully (retry logic).
|
| 93 |
+
- [ ] Verify via `uv run pytest`.
|
docs/implementation/04_phase_ui.md
ADDED
|
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Phase 4 Implementation Spec: Orchestrator & UI
|
| 2 |
+
|
| 3 |
+
**Goal**: Connect the Brain and the Body, then give it a Face.
|
| 4 |
+
**Philosophy**: "Streaming is Trust."
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## 1. The Slice Definition
|
| 9 |
+
|
| 10 |
+
This slice connects:
|
| 11 |
+
1. **Orchestrator**: The state machine (While loop) calling Search -> Judge.
|
| 12 |
+
2. **UI**: Gradio interface that visualizes the loop.
|
| 13 |
+
|
| 14 |
+
**Directory**: `src/features/orchestrator/` and `src/app.py`
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## 2. The Orchestrator Logic
|
| 19 |
+
|
| 20 |
+
This is the "Agent" logic.
|
| 21 |
+
|
| 22 |
+
```python
|
| 23 |
+
class Orchestrator:
|
| 24 |
+
def __init__(self, search_handler, judge_handler):
|
| 25 |
+
self.search = search_handler
|
| 26 |
+
self.judge = judge_handler
|
| 27 |
+
self.history = []
|
| 28 |
+
|
| 29 |
+
async def run_generator(self, query: str):
|
| 30 |
+
"""Yields events for the UI"""
|
| 31 |
+
yield AgentEvent("Searching...")
|
| 32 |
+
evidence = await self.search.execute(query)
|
| 33 |
+
|
| 34 |
+
yield AgentEvent("Judging...")
|
| 35 |
+
assessment = await self.judge.assess(query, evidence)
|
| 36 |
+
|
| 37 |
+
if assessment.sufficient:
|
| 38 |
+
yield AgentEvent("Complete", data=assessment)
|
| 39 |
+
else:
|
| 40 |
+
yield AgentEvent("Looping...", data=assessment.next_queries)
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
## 3. The UI (Gradio)
|
| 46 |
+
|
| 47 |
+
We use **Gradio 5** generator pattern for real-time feedback.
|
| 48 |
+
|
| 49 |
+
```python
|
| 50 |
+
import gradio as gr
|
| 51 |
+
|
| 52 |
+
async def interact(message, history):
|
| 53 |
+
agent = Orchestrator(...)
|
| 54 |
+
async for event in agent.run_generator(message):
|
| 55 |
+
yield f"**{event.step}**: {event.details}"
|
| 56 |
+
|
| 57 |
+
demo = gr.ChatInterface(fn=interact, type="messages")
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
---
|
| 61 |
+
|
| 62 |
+
## 4. TDD Workflow
|
| 63 |
+
|
| 64 |
+
### Step 1: Test the State Machine
|
| 65 |
+
Test the loop logic without UI.
|
| 66 |
+
|
| 67 |
+
```python
|
| 68 |
+
@pytest.mark.asyncio
|
| 69 |
+
async def test_orchestrator_loop_limit():
|
| 70 |
+
# Configure judge to always return "sufficient=False"
|
| 71 |
+
# Assert loop stops at MAX_ITERATIONS
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
### Step 2: Build UI
|
| 75 |
+
Run `uv run python src/app.py` and verify locally.
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
## 5. Implementation Checklist
|
| 80 |
+
|
| 81 |
+
- [ ] Implement `Orchestrator` class.
|
| 82 |
+
- [ ] Write loop logic with max_iterations safety.
|
| 83 |
+
- [ ] Create `src/app.py` with Gradio.
|
| 84 |
+
- [ ] Add "Deployment" configuration (Dockerfile/Spaces config).
|
docs/implementation/roadmap.md
ADDED
|
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Implementation Roadmap: DeepCritical (Vertical Slices)
|
| 2 |
+
|
| 3 |
+
**Philosophy:** AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).
|
| 4 |
+
|
| 5 |
+
This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## π οΈ The 2025 "Gucci" Tooling Stack
|
| 10 |
+
|
| 11 |
+
We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.
|
| 12 |
+
|
| 13 |
+
| Category | Tool | Why? |
|
| 14 |
+
|----------|------|------|
|
| 15 |
+
| **Package Manager** | **`uv`** | Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. |
|
| 16 |
+
| **Linting/Format** | **`ruff`** | Rust-based, instant. Replaces black, isort, flake8. |
|
| 17 |
+
| **Type Checking** | **`mypy`** | Strict static typing. Run via `uv run mypy`. |
|
| 18 |
+
| **Testing** | **`pytest`** | The standard. |
|
| 19 |
+
| **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
|
| 20 |
+
| **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
|
| 21 |
+
| **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
|
| 22 |
+
| **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
## ποΈ Architecture: Vertical Slices
|
| 27 |
+
|
| 28 |
+
Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
|
| 29 |
+
Each slice implements a feature from **Entry Point (UI/API) -> Logic -> Data/External**.
|
| 30 |
+
|
| 31 |
+
### Directory Structure (Feature-First)
|
| 32 |
+
|
| 33 |
+
```
|
| 34 |
+
src/
|
| 35 |
+
βββ app.py # Entry point
|
| 36 |
+
βββ shared/ # Shared utilities (logging, config, base classes)
|
| 37 |
+
β βββ config.py
|
| 38 |
+
β βββ observability.py
|
| 39 |
+
βββ features/ # Vertical Slices
|
| 40 |
+
βββ search/ # Slice: Executing Searches
|
| 41 |
+
β βββ handlers.py
|
| 42 |
+
β βββ tools.py
|
| 43 |
+
β βββ models.py
|
| 44 |
+
βββ judge/ # Slice: Assessing Quality
|
| 45 |
+
β βββ handlers.py
|
| 46 |
+
β βββ prompts.py
|
| 47 |
+
β βββ models.py
|
| 48 |
+
βββ report/ # Slice: Synthesizing Output
|
| 49 |
+
βββ handlers.py
|
| 50 |
+
βββ models.py
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
## π Phased Execution Plan
|
| 56 |
+
|
| 57 |
+
### **Phase 1: Foundation & Tooling (Day 1)**
|
| 58 |
+
*Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*
|
| 59 |
+
- [ ] Initialize `pyproject.toml` with `uv`.
|
| 60 |
+
- [ ] Configure `ruff` (strict) and `mypy` (strict).
|
| 61 |
+
- [ ] Set up `pytest` with sugar and coverage.
|
| 62 |
+
- [ ] Implement `shared/config.py` (Configuration Slice).
|
| 63 |
+
- **Deliverable**: A repo that passes CI with `uv run pytest`.
|
| 64 |
+
|
| 65 |
+
### **Phase 2: The "Search" Vertical Slice (Day 2)**
|
| 66 |
+
*Goal: Agent can receive a query and get raw results from PubMed/Web.*
|
| 67 |
+
- [ ] **TDD**: Write test for `SearchHandler`.
|
| 68 |
+
- [ ] Implement `features/search/tools.py` (PubMed + DuckDuckGo).
|
| 69 |
+
- [ ] Implement `features/search/handlers.py` (Orchestrates tools).
|
| 70 |
+
- **Deliverable**: Function that takes "long covid" -> returns `List[Evidence]`.
|
| 71 |
+
|
| 72 |
+
### **Phase 3: The "Judge" Vertical Slice (Day 3)**
|
| 73 |
+
*Goal: Agent can decide if evidence is sufficient.*
|
| 74 |
+
- [ ] **TDD**: Write test for `JudgeHandler` (Mocked LLM).
|
| 75 |
+
- [ ] Implement `features/judge/prompts.py` (Structured outputs).
|
| 76 |
+
- [ ] Implement `features/judge/handlers.py` (LLM interaction).
|
| 77 |
+
- **Deliverable**: Function that takes `List[Evidence]` -> returns `JudgeAssessment`.
|
| 78 |
+
|
| 79 |
+
### **Phase 4: The "Loop" & UI Slice (Day 4)**
|
| 80 |
+
*Goal: End-to-End User Value.*
|
| 81 |
+
- [ ] Implement the `Orchestrator` (Connects Search + Judge loops).
|
| 82 |
+
- [ ] Build `features/ui/` (Gradio with Streaming).
|
| 83 |
+
- **Deliverable**: Working DeepCritical Agent on HuggingFace.
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## π Spec Documents
|
| 88 |
+
|
| 89 |
+
1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)**
|
| 90 |
+
2. **[Phase 2 Spec: Search Slice](02_phase_search.md)**
|
| 91 |
+
3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)**
|
| 92 |
+
4. **[Phase 4 Spec: UI & Loop](04_phase_ui.md)**
|
| 93 |
+
|
| 94 |
+
*Start by reading Phase 1 Spec to initialize the repo.*
|