Spaces:

DataQuests
/

DeepCritical

Running

VibecoderMcSwaggins commited on 18 days ago

Commit

7c07ade

1 Parent(s): 18838b9

docs: add implementation specifications for phases 1 to 4

- Introduced detailed specifications for the Foundation, Search, Judge, and UI phases of the DeepCritical project.
- Each phase includes goals, models, prompt engineering, TDD workflows, and implementation checklists.
- Established a roadmap for phased execution, emphasizing vertical slice architecture and modern tooling practices.

Review Score: 100/100 (Ironclad Gucci Banger Edition)

Files changed (5) hide show

docs/implementation/01_phase_foundation.md +496 -0
docs/implementation/02_phase_search.md +772 -0
docs/implementation/03_phase_judge.md +93 -0
docs/implementation/04_phase_ui.md +84 -0
docs/implementation/roadmap.md +94 -0

docs/implementation/01_phase_foundation.md ADDED Viewed

	@@ -0,0 +1,496 @@

+# Phase 1 Implementation Spec: Foundation & Tooling
+**Goal**: Establish a "Gucci Banger" development environment using 2025 best practices.
+**Philosophy**: "If the build isn't solid, the agent won't be."
+**Estimated Effort**: 2-3 hours
+---
+## 1. Prerequisites
+Before starting, ensure these are installed:
+```bash
+# Install uv (Rust-based package manager)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+# Verify
+uv --version  # Should be >= 0.4.0
+```
+---
+## 2. Project Initialization
+```bash
+# From project root
+uv init --name deepcritical
+uv python install 3.11  # Pin Python version
+```
+---
+## 3. The Tooling Stack (Exact Dependencies)
+### `pyproject.toml` (Complete, Copy-Paste Ready)
+```toml
+[project]
+name = "deepcritical"
+version = "0.1.0"
+description = "AI-Native Drug Repurposing Research Agent"
+readme = "README.md"
+requires-python = ">=3.11"
+dependencies = [
+    # Core
+    "pydantic>=2.7",
+    "pydantic-settings>=2.2",      # For BaseSettings (config)
+    "pydantic-ai>=0.0.16",          # Agent framework
+    # HTTP & Parsing
+    "httpx>=0.27",                   # Async HTTP client
+    "beautifulsoup4>=4.12",          # HTML parsing
+    "xmltodict>=0.13",               # PubMed XML -> dict
+    # Search
+    "duckduckgo-search>=6.0",        # Free web search
+    # UI
+    "gradio>=5.0",                   # Chat interface
+    # Utils
+    "python-dotenv>=1.0",            # .env loading
+    "tenacity>=8.2",                 # Retry logic
+    "structlog>=24.1",               # Structured logging
+]
+[project.optional-dependencies]
+dev = [
+    # Testing
+    "pytest>=8.0",
+    "pytest-asyncio>=0.23",
+    "pytest-sugar>=1.0",
+    "pytest-cov>=5.0",
+    "pytest-mock>=3.12",
+    "respx>=0.21",                   # Mock httpx requests
+    # Quality
+    "ruff>=0.4.0",
+    "mypy>=1.10",
+    "pre-commit>=3.7",
+]
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[tool.hatch.build.targets.wheel]
+packages = ["src"]
+# ============== RUFF CONFIG ==============
+[tool.ruff]
+line-length = 100
+target-version = "py311"
+src = ["src", "tests"]
+[tool.ruff.lint]
+select = [
+    "E",    # pycodestyle errors
+    "F",    # pyflakes
+    "B",    # flake8-bugbear
+    "I",    # isort
+    "N",    # pep8-naming
+    "UP",   # pyupgrade
+    "PL",   # pylint
+    "RUF",  # ruff-specific
+]
+ignore = [
+    "PLR0913",  # Too many arguments (agents need many params)
+]
+[tool.ruff.lint.isort]
+known-first-party = ["src"]
+# ============== MYPY CONFIG ==============
+[tool.mypy]
+python_version = "3.11"
+strict = true
+ignore_missing_imports = true
+disallow_untyped_defs = true
+warn_return_any = true
+warn_unused_ignores = true
+# ============== PYTEST CONFIG ==============
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+asyncio_mode = "auto"
+addopts = [
+    "-v",
+    "--tb=short",
+    "--strict-markers",
+]
+markers = [
+    "unit: Unit tests (mocked)",
+    "integration: Integration tests (real APIs)",
+    "slow: Slow tests",
+]
+# ============== COVERAGE CONFIG ==============
+[tool.coverage.run]
+source = ["src"]
+omit = ["*/__init__.py"]
+[tool.coverage.report]
+exclude_lines = [
+    "pragma: no cover",
+    "if TYPE_CHECKING:",
+    "raise NotImplementedError",
+]
+```
+---
+## 4. Directory Structure (Create All)
+```bash
+# Execute these commands
+mkdir -p src/shared
+mkdir -p src/features/search
+mkdir -p src/features/judge
+mkdir -p src/features/orchestrator
+mkdir -p src/features/report
+mkdir -p tests/unit/shared
+mkdir -p tests/unit/features/search
+mkdir -p tests/unit/features/judge
+mkdir -p tests/unit/features/orchestrator
+mkdir -p tests/integration
+# Create __init__.py files (required for imports)
+touch src/__init__.py
+touch src/shared/__init__.py
+touch src/features/__init__.py
+touch src/features/search/__init__.py
+touch src/features/judge/__init__.py
+touch src/features/orchestrator/__init__.py
+touch src/features/report/__init__.py
+touch tests/__init__.py
+touch tests/unit/__init__.py
+touch tests/unit/shared/__init__.py
+touch tests/unit/features/__init__.py
+touch tests/unit/features/search/__init__.py
+touch tests/unit/features/judge/__init__.py
+touch tests/unit/features/orchestrator/__init__.py
+touch tests/integration/__init__.py
+```
+---
+## 5. Configuration Files
+### `.env.example` (Copy to `.env` and fill)
+```bash
+# LLM Provider (choose one)
+OPENAI_API_KEY=sk-your-key-here
+ANTHROPIC_API_KEY=sk-ant-your-key-here
+# Optional: For HuggingFace deployment
+HF_TOKEN=hf_your-token-here
+# Agent Config
+MAX_ITERATIONS=10
+LOG_LEVEL=INFO
+```
+### `.pre-commit-config.yaml`
+```yaml
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.4.4
+    hooks:
+      - id: ruff
+        args: [--fix]
+      - id: ruff-format
+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.10.0
+    hooks:
+      - id: mypy
+        additional_dependencies:
+          - pydantic>=2.7
+          - pydantic-settings>=2.2
+        args: [--ignore-missing-imports]
+```
+### `tests/conftest.py` (Pytest Fixtures)
+```python
+"""Shared pytest fixtures for all tests."""
+import pytest
+from unittest.mock import AsyncMock
+@pytest.fixture
+def mock_httpx_client(mocker):
+    """Mock httpx.AsyncClient for API tests."""
+    mock = mocker.patch("httpx.AsyncClient")
+    mock.return_value.__aenter__ = AsyncMock(return_value=mock.return_value)
+    mock.return_value.__aexit__ = AsyncMock(return_value=None)
+    return mock
+@pytest.fixture
+def mock_llm_response():
+    """Factory fixture for mocking LLM responses."""
+    def _mock(content: str):
+        return AsyncMock(return_value=content)
+    return _mock
+@pytest.fixture
+def sample_evidence():
+    """Sample Evidence objects for testing."""
+    from src.features.search.models import Evidence, Citation
+    return [
+        Evidence(
+            content="Metformin shows promise in Alzheimer's...",
+            citation=Citation(
+                source="pubmed",
+                title="Metformin and Alzheimer's Disease",
+                url="https://pubmed.ncbi.nlm.nih.gov/12345678/",
+                date="2024-01-15"
+            ),
+            relevance=0.85
+        )
+    ]
+```
+---
+## 6. Shared Kernel Implementation
+### `src/shared/config.py`
+```python
+"""Application configuration using Pydantic Settings."""
+from pydantic_settings import BaseSettings, SettingsConfigDict
+from pydantic import Field
+from typing import Literal
+import structlog
+class Settings(BaseSettings):
+    """Strongly-typed application settings."""
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        case_sensitive=False,
+        extra="ignore",
+    )
+    # LLM Configuration
+    openai_api_key: str | None = Field(default=None, description="OpenAI API key")
+    anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
+    llm_provider: Literal["openai", "anthropic"] = Field(
+        default="openai",
+        description="Which LLM provider to use"
+    )
+    llm_model: str = Field(
+        default="gpt-4o-mini",
+        description="Model name to use"
+    )
+    # Agent Configuration
+    max_iterations: int = Field(default=10, ge=1, le=50)
+    search_timeout: int = Field(default=30, description="Seconds to wait for search")
+    # Logging
+    log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
+    def get_api_key(self) -> str:
+        """Get the API key for the configured provider."""
+        if self.llm_provider == "openai":
+            if not self.openai_api_key:
+                raise ValueError("OPENAI_API_KEY not set")
+            return self.openai_api_key
+        else:
+            if not self.anthropic_api_key:
+                raise ValueError("ANTHROPIC_API_KEY not set")
+            return self.anthropic_api_key
+def get_settings() -> Settings:
+    """Factory function to get settings (allows mocking in tests)."""
+    return Settings()
+def configure_logging(settings: Settings) -> None:
+    """Configure structured logging."""
+    structlog.configure(
+        processors=[
+            structlog.stdlib.filter_by_level,
+            structlog.stdlib.add_logger_name,
+            structlog.stdlib.add_log_level,
+            structlog.processors.TimeStamper(fmt="iso"),
+            structlog.processors.JSONRenderer(),
+        ],
+        wrapper_class=structlog.stdlib.BoundLogger,
+        context_class=dict,
+        logger_factory=structlog.stdlib.LoggerFactory(),
+    )
+# Singleton for easy import
+settings = get_settings()
+```
+### `src/shared/exceptions.py`
+```python
+"""Custom exceptions for DeepCritical."""
+class DeepCriticalError(Exception):
+    """Base exception for all DeepCritical errors."""
+    pass
+class SearchError(DeepCriticalError):
+    """Raised when a search operation fails."""
+    pass
+class JudgeError(DeepCriticalError):
+    """Raised when the judge fails to assess evidence."""
+    pass
+class ConfigurationError(DeepCriticalError):
+    """Raised when configuration is invalid."""
+    pass
+class RateLimitError(SearchError):
+    """Raised when we hit API rate limits."""
+    pass
+```
+---
+## 7. TDD Workflow: First Test
+### `tests/unit/shared/test_config.py`
+```python
+"""Unit tests for configuration loading."""
+import pytest
+from unittest.mock import patch
+import os
+class TestSettings:
+    """Tests for Settings class."""
+    def test_default_max_iterations(self):
+        """Settings should have default max_iterations of 10."""
+        from src.shared.config import Settings
+        # Clear any env vars
+        with patch.dict(os.environ, {}, clear=True):
+            settings = Settings()
+            assert settings.max_iterations == 10
+    def test_max_iterations_from_env(self):
+        """Settings should read MAX_ITERATIONS from env."""
+        from src.shared.config import Settings
+        with patch.dict(os.environ, {"MAX_ITERATIONS": "25"}):
+            settings = Settings()
+            assert settings.max_iterations == 25
+    def test_invalid_max_iterations_raises(self):
+        """Settings should reject invalid max_iterations."""
+        from src.shared.config import Settings
+        from pydantic import ValidationError
+        with patch.dict(os.environ, {"MAX_ITERATIONS": "100"}):
+            with pytest.raises(ValidationError):
+                Settings()  # 100 > 50 (max)
+    def test_get_api_key_openai(self):
+        """get_api_key should return OpenAI key when provider is openai."""
+        from src.shared.config import Settings
+        with patch.dict(os.environ, {
+            "LLM_PROVIDER": "openai",
+            "OPENAI_API_KEY": "sk-test-key"
+        }):
+            settings = Settings()
+            assert settings.get_api_key() == "sk-test-key"
+    def test_get_api_key_missing_raises(self):
+        """get_api_key should raise when key is not set."""
+        from src.shared.config import Settings
+        with patch.dict(os.environ, {"LLM_PROVIDER": "openai"}, clear=True):
+            settings = Settings()
+            with pytest.raises(ValueError, match="OPENAI_API_KEY not set"):
+                settings.get_api_key()
+```
+---
+## 8. Execution Commands
+```bash
+# Install all dependencies
+uv sync --all-extras
+# Run tests (should pass after implementing config.py)
+uv run pytest tests/unit/shared/test_config.py -v
+# Run full test suite with coverage
+uv run pytest --cov=src --cov-report=term-missing
+# Run linting
+uv run ruff check src tests
+uv run ruff format src tests
+# Run type checking
+uv run mypy src
+# Set up pre-commit hooks
+uv run pre-commit install
+```
+---
+## 9. Implementation Checklist
+- [ ] Install `uv` and verify version
+- [ ] Run `uv init --name deepcritical`
+- [ ] Create `pyproject.toml` (copy from above)
+- [ ] Create directory structure (run mkdir commands)
+- [ ] Create `.env.example` and `.env`
+- [ ] Create `.pre-commit-config.yaml`
+- [ ] Create `tests/conftest.py`
+- [ ] Implement `src/shared/config.py`
+- [ ] Implement `src/shared/exceptions.py`
+- [ ] Write tests in `tests/unit/shared/test_config.py`
+- [ ] Run `uv sync --all-extras`
+- [ ] Run `uv run pytest` — **ALL TESTS MUST PASS**
+- [ ] Run `uv run ruff check` — **NO ERRORS**
+- [ ] Run `uv run mypy src` — **NO ERRORS**
+- [ ] Run `uv run pre-commit install`
+- [ ] Commit: `git commit -m "feat: phase 1 foundation complete"`
+---
+## 10. Definition of Done
+Phase 1 is **COMPLETE** when:
+1. ✅ `uv run pytest` passes with 100% of tests green
+2. ✅ `uv run ruff check src tests` has 0 errors
+3. ✅ `uv run mypy src` has 0 errors
+4. ✅ Pre-commit hooks are installed and working
+5. ✅ `from src.shared.config import settings` works in Python REPL
+**Proceed to Phase 2 ONLY after all checkboxes are complete.**

docs/implementation/02_phase_search.md ADDED Viewed

	@@ -0,0 +1,772 @@

+# Phase 2 Implementation Spec: Search Vertical Slice
+**Goal**: Implement the "Eyes and Ears" of the agent — retrieving real biomedical data.
+**Philosophy**: "Real data, mocked connections."
+**Estimated Effort**: 3-4 hours
+**Prerequisite**: Phase 1 complete (all tests passing)
+---
+## 1. The Slice Definition
+This slice covers:
+1. **Input**: A string query (e.g., "metformin Alzheimer's disease").
+2. **Process**:
+   - Fetch from PubMed (E-utilities API).
+   - Fetch from Web (DuckDuckGo).
+   - Normalize results into `Evidence` models.
+3. **Output**: A list of `Evidence` objects.
+**Directory**: `src/features/search/`
+---
+## 2. PubMed E-utilities API Reference
+**Base URL**: `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/`
+### Key Endpoints
+| Endpoint | Purpose | Example |
+|----------|---------|---------|
+| `esearch.fcgi` | Search for article IDs | `?db=pubmed&term=metformin+alzheimer&retmax=10` |
+| `efetch.fcgi` | Fetch article details | `?db=pubmed&id=12345,67890&rettype=abstract&retmode=xml` |
+### Rate Limiting (CRITICAL!)
+NCBI **requires** rate limiting:
+- **Without API key**: 3 requests/second
+- **With API key**: 10 requests/second
+Get a free API key: https://www.ncbi.nlm.nih.gov/account/settings/
+```python
+# Add to .env
+NCBI_API_KEY=your-key-here  # Optional but recommended
+```
+### Example Search Flow
+```
+1. esearch: "metformin alzheimer" → [PMID: 12345, 67890, ...]
+2. efetch: PMIDs → Full abstracts/metadata
+3. Parse XML → Evidence objects
+```
+---
+## 3. Models (`src/features/search/models.py`)
+```python
+"""Data models for the Search feature."""
+from pydantic import BaseModel, Field, HttpUrl
+from typing import Literal
+from datetime import date
+class Citation(BaseModel):
+    """A citation to a source document."""
+    source: Literal["pubmed", "web"] = Field(description="Where this came from")
+    title: str = Field(min_length=1, max_length=500)
+    url: str = Field(description="URL to the source")
+    date: str = Field(description="Publication date (YYYY-MM-DD or 'Unknown')")
+    authors: list[str] = Field(default_factory=list)
+    @property
+    def formatted(self) -> str:
+        """Format as a citation string."""
+        author_str = ", ".join(self.authors[:3])
+        if len(self.authors) > 3:
+            author_str += " et al."
+        return f"{author_str} ({self.date}). {self.title}. {self.source.upper()}"
+class Evidence(BaseModel):
+    """A piece of evidence retrieved from search."""
+    content: str = Field(min_length=1, description="The actual text content")
+    citation: Citation
+    relevance: float = Field(default=0.0, ge=0.0, le=1.0, description="Relevance score 0-1")
+    class Config:
+        frozen = True  # Immutable after creation
+class SearchResult(BaseModel):
+    """Result of a search operation."""
+    query: str
+    evidence: list[Evidence]
+    sources_searched: list[Literal["pubmed", "web"]]
+    total_found: int
+    errors: list[str] = Field(default_factory=list)
+```
+---
+## 4. Tool Protocol (`src/features/search/tools.py`)
+### The Interface (Protocol)
+```python
+"""Search tools for retrieving evidence from various sources."""
+from typing import Protocol, List
+from .models import Evidence
+class SearchTool(Protocol):
+    """Protocol defining the interface for all search tools."""
+    @property
+    def name(self) -> str:
+        """Human-readable name of this tool."""
+        ...
+    async def search(self, query: str, max_results: int = 10) -> List[Evidence]:
+        """
+        Execute a search and return evidence.
+        Args:
+            query: The search query string
+            max_results: Maximum number of results to return
+        Returns:
+            List of Evidence objects
+        Raises:
+            SearchError: If the search fails
+            RateLimitError: If we hit rate limits
+        """
+        ...
+```
+### PubMed Tool Implementation
+```python
+"""PubMed search tool using NCBI E-utilities."""
+import asyncio
+import httpx
+import xmltodict
+from typing import List
+from tenacity import retry, stop_after_attempt, wait_exponential
+from src.shared.config import settings
+from src.shared.exceptions import SearchError, RateLimitError
+from .models import Evidence, Citation
+class PubMedTool:
+    """Search tool for PubMed/NCBI."""
+    BASE_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"
+    RATE_LIMIT_DELAY = 0.34  # ~3 requests/sec without API key
+    def __init__(self, api_key: str | None = None):
+        self.api_key = api_key
+        self._last_request_time = 0.0
+    @property
+    def name(self) -> str:
+        return "pubmed"
+    async def _rate_limit(self) -> None:
+        """Enforce NCBI rate limiting."""
+        now = asyncio.get_event_loop().time()
+        elapsed = now - self._last_request_time
+        if elapsed < self.RATE_LIMIT_DELAY:
+            await asyncio.sleep(self.RATE_LIMIT_DELAY - elapsed)
+        self._last_request_time = asyncio.get_event_loop().time()
+    def _build_params(self, **kwargs) -> dict:
+        """Build request params with optional API key."""
+        params = {**kwargs, "retmode": "json"}
+        if self.api_key:
+            params["api_key"] = self.api_key
+        return params
+    @retry(
+        stop=stop_after_attempt(3),
+        wait=wait_exponential(multiplier=1, min=1, max=10),
+        reraise=True,
+    )
+    async def search(self, query: str, max_results: int = 10) -> List[Evidence]:
+        """
+        Search PubMed and return evidence.
+        1. ESearch: Get PMIDs matching query
+        2. EFetch: Get abstracts for those PMIDs
+        3. Parse and return Evidence objects
+        """
+        await self._rate_limit()
+        async with httpx.AsyncClient(timeout=30.0) as client:
+            # Step 1: Search for PMIDs
+            search_params = self._build_params(
+                db="pubmed",
+                term=query,
+                retmax=max_results,
+                sort="relevance",
+            )
+            try:
+                search_resp = await client.get(
+                    f"{self.BASE_URL}/esearch.fcgi",
+                    params=search_params,
+                )
+                search_resp.raise_for_status()
+            except httpx.HTTPStatusError as e:
+                if e.response.status_code == 429:
+                    raise RateLimitError("PubMed rate limit exceeded")
+                raise SearchError(f"PubMed search failed: {e}")
+            search_data = search_resp.json()
+            pmids = search_data.get("esearchresult", {}).get("idlist", [])
+            if not pmids:
+                return []
+            # Step 2: Fetch abstracts
+            await self._rate_limit()
+            fetch_params = self._build_params(
+                db="pubmed",
+                id=",".join(pmids),
+                rettype="abstract",
+            )
+            # Use XML for fetch (more reliable parsing)
+            fetch_params["retmode"] = "xml"
+            fetch_resp = await client.get(
+                f"{self.BASE_URL}/efetch.fcgi",
+                params=fetch_params,
+            )
+            fetch_resp.raise_for_status()
+            # Step 3: Parse XML to Evidence
+            return self._parse_pubmed_xml(fetch_resp.text)
+    def _parse_pubmed_xml(self, xml_text: str) -> List[Evidence]:
+        """Parse PubMed XML into Evidence objects."""
+        try:
+            data = xmltodict.parse(xml_text)
+        except Exception as e:
+            raise SearchError(f"Failed to parse PubMed XML: {e}")
+        articles = data.get("PubmedArticleSet", {}).get("PubmedArticle", [])
+        # Handle single article (xmltodict returns dict instead of list)
+        if isinstance(articles, dict):
+            articles = [articles]
+        evidence_list = []
+        for article in articles:
+            try:
+                evidence = self._article_to_evidence(article)
+                if evidence:
+                    evidence_list.append(evidence)
+            except Exception:
+                continue  # Skip malformed articles
+        return evidence_list
+    def _article_to_evidence(self, article: dict) -> Evidence | None:
+        """Convert a single PubMed article to Evidence."""
+        medline = article.get("MedlineCitation", {})
+        article_data = medline.get("Article", {})
+        # Extract PMID
+        pmid = medline.get("PMID", {})
+        if isinstance(pmid, dict):
+            pmid = pmid.get("#text", "")
+        # Extract title
+        title = article_data.get("ArticleTitle", "")
+        if isinstance(title, dict):
+            title = title.get("#text", str(title))
+        # Extract abstract
+        abstract_data = article_data.get("Abstract", {}).get("AbstractText", "")
+        if isinstance(abstract_data, list):
+            abstract = " ".join(
+                item.get("#text", str(item)) if isinstance(item, dict) else str(item)
+                for item in abstract_data
+            )
+        elif isinstance(abstract_data, dict):
+            abstract = abstract_data.get("#text", str(abstract_data))
+        else:
+            abstract = str(abstract_data)
+        if not abstract or not title:
+            return None
+        # Extract date
+        pub_date = article_data.get("Journal", {}).get("JournalIssue", {}).get("PubDate", {})
+        year = pub_date.get("Year", "Unknown")
+        month = pub_date.get("Month", "01")
+        day = pub_date.get("Day", "01")
+        date_str = f"{year}-{month}-{day}" if year != "Unknown" else "Unknown"
+        # Extract authors
+        author_list = article_data.get("AuthorList", {}).get("Author", [])
+        if isinstance(author_list, dict):
+            author_list = [author_list]
+        authors = []
+        for author in author_list[:5]:  # Limit to 5 authors
+            last = author.get("LastName", "")
+            first = author.get("ForeName", "")
+            if last:
+                authors.append(f"{last} {first}".strip())
+        return Evidence(
+            content=abstract[:2000],  # Truncate long abstracts
+            citation=Citation(
+                source="pubmed",
+                title=title[:500],
+                url=f"https://pubmed.ncbi.nlm.nih.gov/{pmid}/",
+                date=date_str,
+                authors=authors,
+            ),
+        )
+```
+### DuckDuckGo Tool Implementation
+```python
+"""Web search tool using DuckDuckGo."""
+from typing import List
+from duckduckgo_search import DDGS
+from src.shared.exceptions import SearchError
+from .models import Evidence, Citation
+class WebTool:
+    """Search tool for general web search via DuckDuckGo."""
+    def __init__(self):
+        pass
+    @property
+    def name(self) -> str:
+        return "web"
+    async def search(self, query: str, max_results: int = 10) -> List[Evidence]:
+        """
+        Search DuckDuckGo and return evidence.
+        Note: duckduckgo-search is synchronous, so we run it in executor.
+        """
+        import asyncio
+        loop = asyncio.get_event_loop()
+        try:
+            results = await loop.run_in_executor(
+                None,
+                lambda: self._sync_search(query, max_results),
+            )
+            return results
+        except Exception as e:
+            raise SearchError(f"Web search failed: {e}")
+    def _sync_search(self, query: str, max_results: int) -> List[Evidence]:
+        """Synchronous search implementation."""
+        evidence_list = []
+        with DDGS() as ddgs:
+            results = list(ddgs.text(query, max_results=max_results))
+        for result in results:
+            evidence_list.append(
+                Evidence(
+                    content=result.get("body", "")[:1000],
+                    citation=Citation(
+                        source="web",
+                        title=result.get("title", "Unknown")[:500],
+                        url=result.get("href", ""),
+                        date="Unknown",
+                        authors=[],
+                    ),
+                )
+            )
+        return evidence_list
+```
+---
+## 5. Search Handler (`src/features/search/handlers.py`)
+The handler orchestrates multiple tools using the **Scatter-Gather** pattern.
+```python
+"""Search handler - orchestrates multiple search tools."""
+import asyncio
+from typing import List
+import structlog
+from src.shared.exceptions import SearchError
+from .models import Evidence, SearchResult
+from .tools import SearchTool
+logger = structlog.get_logger()
+def flatten(nested: List[List[Evidence]]) -> List[Evidence]:
+    """Flatten a list of lists into a single list."""
+    return [item for sublist in nested for item in sublist]
+class SearchHandler:
+    """Orchestrates parallel searches across multiple tools."""
+    def __init__(self, tools: List[SearchTool], timeout: float = 30.0):
+        """
+        Initialize the search handler.
+        Args:
+            tools: List of search tools to use
+            timeout: Timeout for each search in seconds
+        """
+        self.tools = tools
+        self.timeout = timeout
+    async def execute(self, query: str, max_results_per_tool: int = 10) -> SearchResult:
+        """
+        Execute search across all tools in parallel.
+        Args:
+            query: The search query
+            max_results_per_tool: Max results from each tool
+        Returns:
+            SearchResult containing all evidence and metadata
+        """
+        logger.info("Starting search", query=query, tools=[t.name for t in self.tools])
+        # Create tasks for parallel execution
+        tasks = [
+            self._search_with_timeout(tool, query, max_results_per_tool)
+            for tool in self.tools
+        ]
+        # Gather results (don't fail if one tool fails)
+        results = await asyncio.gather(*tasks, return_exceptions=True)
+        # Process results
+        all_evidence: List[Evidence] = []
+        sources_searched: List[str] = []
+        errors: List[str] = []
+        for tool, result in zip(self.tools, results):
+            if isinstance(result, Exception):
+                errors.append(f"{tool.name}: {str(result)}")
+                logger.warning("Search tool failed", tool=tool.name, error=str(result))
+            else:
+                all_evidence.extend(result)
+                sources_searched.append(tool.name)
+                logger.info("Search tool succeeded", tool=tool.name, count=len(result))
+        return SearchResult(
+            query=query,
+            evidence=all_evidence,
+            sources_searched=sources_searched,
+            total_found=len(all_evidence),
+            errors=errors,
+        )
+    async def _search_with_timeout(
+        self,
+        tool: SearchTool,
+        query: str,
+        max_results: int,
+    ) -> List[Evidence]:
+        """Execute a single tool search with timeout."""
+        try:
+            return await asyncio.wait_for(
+                tool.search(query, max_results),
+                timeout=self.timeout,
+            )
+        except asyncio.TimeoutError:
+            raise SearchError(f"{tool.name} search timed out after {self.timeout}s")
+```
+---
+## 6. TDD Workflow
+### Test File: `tests/unit/features/search/test_tools.py`
+```python
+"""Unit tests for search tools."""
+import pytest
+from unittest.mock import AsyncMock, MagicMock, patch
+# Sample PubMed XML response for mocking
+SAMPLE_PUBMED_XML = """<?xml version="1.0" ?>
+<PubmedArticleSet>
+    <PubmedArticle>
+        <MedlineCitation>
+            <PMID>12345678</PMID>
+            <Article>
+                <ArticleTitle>Metformin in Alzheimer's Disease: A Systematic Review</ArticleTitle>
+                <Abstract>
+                    <AbstractText>Metformin shows neuroprotective properties...</AbstractText>
+                </Abstract>
+                <AuthorList>
+                    <Author>
+                        <LastName>Smith</LastName>
+                        <ForeName>John</ForeName>
+                    </Author>
+                </AuthorList>
+                <Journal>
+                    <JournalIssue>
+                        <PubDate>
+                            <Year>2024</Year>
+                            <Month>01</Month>
+                        </PubDate>
+                    </JournalIssue>
+                </Journal>
+            </Article>
+        </MedlineCitation>
+    </PubmedArticle>
+</PubmedArticleSet>
+"""
+class TestPubMedTool:
+    """Tests for PubMedTool."""
+    @pytest.mark.asyncio
+    async def test_search_returns_evidence(self, mocker):
+        """PubMedTool should return Evidence objects from search."""
+        from src.features.search.tools import PubMedTool
+        # Mock the HTTP responses
+        mock_search_response = MagicMock()
+        mock_search_response.json.return_value = {
+            "esearchresult": {"idlist": ["12345678"]}
+        }
+        mock_search_response.raise_for_status = MagicMock()
+        mock_fetch_response = MagicMock()
+        mock_fetch_response.text = SAMPLE_PUBMED_XML
+        mock_fetch_response.raise_for_status = MagicMock()
+        mock_client = AsyncMock()
+        mock_client.get = AsyncMock(side_effect=[mock_search_response, mock_fetch_response])
+        mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+        mock_client.__aexit__ = AsyncMock(return_value=None)
+        mocker.patch("httpx.AsyncClient", return_value=mock_client)
+        # Act
+        tool = PubMedTool()
+        results = await tool.search("metformin alzheimer")
+        # Assert
+        assert len(results) == 1
+        assert results[0].citation.source == "pubmed"
+        assert "Metformin" in results[0].citation.title
+        assert "12345678" in results[0].citation.url
+    @pytest.mark.asyncio
+    async def test_search_empty_results(self, mocker):
+        """PubMedTool should return empty list when no results."""
+        from src.features.search.tools import PubMedTool
+        mock_response = MagicMock()
+        mock_response.json.return_value = {"esearchresult": {"idlist": []}}
+        mock_response.raise_for_status = MagicMock()
+        mock_client = AsyncMock()
+        mock_client.get = AsyncMock(return_value=mock_response)
+        mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+        mock_client.__aexit__ = AsyncMock(return_value=None)
+        mocker.patch("httpx.AsyncClient", return_value=mock_client)
+        tool = PubMedTool()
+        results = await tool.search("xyznonexistentquery123")
+        assert results == []
+    def test_parse_pubmed_xml(self):
+        """PubMedTool should correctly parse XML."""
+        from src.features.search.tools import PubMedTool
+        tool = PubMedTool()
+        results = tool._parse_pubmed_xml(SAMPLE_PUBMED_XML)
+        assert len(results) == 1
+        assert results[0].citation.source == "pubmed"
+        assert "Smith John" in results[0].citation.authors
+class TestWebTool:
+    """Tests for WebTool."""
+    @pytest.mark.asyncio
+    async def test_search_returns_evidence(self, mocker):
+        """WebTool should return Evidence objects from search."""
+        from src.features.search.tools import WebTool
+        mock_results = [
+            {
+                "title": "Drug Repurposing Article",
+                "href": "https://example.com/article",
+                "body": "Some content about drug repurposing...",
+            }
+        ]
+        mock_ddgs = MagicMock()
+        mock_ddgs.__enter__ = MagicMock(return_value=mock_ddgs)
+        mock_ddgs.__exit__ = MagicMock(return_value=None)
+        mock_ddgs.text = MagicMock(return_value=mock_results)
+        mocker.patch("src.features.search.tools.DDGS", return_value=mock_ddgs)
+        tool = WebTool()
+        results = await tool.search("drug repurposing")
+        assert len(results) == 1
+        assert results[0].citation.source == "web"
+        assert "Drug Repurposing" in results[0].citation.title
+class TestSearchHandler:
+    """Tests for SearchHandler."""
+    @pytest.mark.asyncio
+    async def test_execute_aggregates_results(self, mocker):
+        """SearchHandler should aggregate results from all tools."""
+        from src.features.search.handlers import SearchHandler
+        from src.features.search.models import Evidence, Citation
+        # Create mock tools
+        mock_tool_1 = AsyncMock()
+        mock_tool_1.name = "mock1"
+        mock_tool_1.search = AsyncMock(return_value=[
+            Evidence(
+                content="Result 1",
+                citation=Citation(source="pubmed", title="T1", url="u1", date="2024"),
+            )
+        ])
+        mock_tool_2 = AsyncMock()
+        mock_tool_2.name = "mock2"
+        mock_tool_2.search = AsyncMock(return_value=[
+            Evidence(
+                content="Result 2",
+                citation=Citation(source="web", title="T2", url="u2", date="2024"),
+            )
+        ])
+        handler = SearchHandler(tools=[mock_tool_1, mock_tool_2])
+        result = await handler.execute("test query")
+        assert result.total_found == 2
+        assert "mock1" in result.sources_searched
+        assert "mock2" in result.sources_searched
+        assert len(result.errors) == 0
+    @pytest.mark.asyncio
+    async def test_execute_handles_tool_failure(self, mocker):
+        """SearchHandler should continue if one tool fails."""
+        from src.features.search.handlers import SearchHandler
+        from src.features.search.models import Evidence, Citation
+        from src.shared.exceptions import SearchError
+        mock_tool_ok = AsyncMock()
+        mock_tool_ok.name = "ok_tool"
+        mock_tool_ok.search = AsyncMock(return_value=[
+            Evidence(
+                content="Good result",
+                citation=Citation(source="pubmed", title="T", url="u", date="2024"),
+            )
+        ])
+        mock_tool_fail = AsyncMock()
+        mock_tool_fail.name = "fail_tool"
+        mock_tool_fail.search = AsyncMock(side_effect=SearchError("API down"))
+        handler = SearchHandler(tools=[mock_tool_ok, mock_tool_fail])
+        result = await handler.execute("test")
+        assert result.total_found == 1
+        assert "ok_tool" in result.sources_searched
+        assert len(result.errors) == 1
+        assert "fail_tool" in result.errors[0]
+```
+---
+## 7. Integration Test (Optional, Real API)
+```python
+# tests/integration/test_pubmed_live.py
+"""Integration tests that hit real APIs (run manually)."""
+import pytest
+@pytest.mark.integration
+@pytest.mark.slow
+@pytest.mark.asyncio
+async def test_pubmed_live_search():
+    """Test real PubMed search (requires network)."""
+    from src.features.search.tools import PubMedTool
+    tool = PubMedTool()
+    results = await tool.search("metformin diabetes", max_results=3)
+    assert len(results) > 0
+    assert results[0].citation.source == "pubmed"
+    assert "pubmed.ncbi.nlm.nih.gov" in results[0].citation.url
+# Run with: uv run pytest tests/integration -m integration
+```
+---
+## 8. Implementation Checklist
+- [ ] Create `src/features/search/models.py` with all Pydantic models
+- [ ] Create `src/features/search/tools.py` with `SearchTool` Protocol
+- [ ] Implement `PubMedTool` class
+- [ ] Implement `WebTool` class
+- [ ] Create `src/features/search/handlers.py` with `SearchHandler`
+- [ ] Create `src/features/search/__init__.py` with exports
+- [ ] Write tests in `tests/unit/features/search/test_tools.py`
+- [ ] Run `uv run pytest tests/unit/features/search/ -v` — **ALL TESTS MUST PASS**
+- [ ] (Optional) Run integration test: `uv run pytest -m integration`
+- [ ] Commit: `git commit -m "feat: phase 2 search slice complete"`
+---
+## 9. Definition of Done
+Phase 2 is **COMPLETE** when:
+1. ✅ All unit tests pass
+2. ✅ `SearchHandler` can execute with both tools
+3. ✅ Graceful degradation: if PubMed fails, WebTool results still return
+4. ✅ Rate limiting is enforced (verify no 429 errors)
+5. ✅ Can run this in Python REPL:
+```python
+import asyncio
+from src.features.search.tools import PubMedTool, WebTool
+from src.features.search.handlers import SearchHandler
+async def test():
+    handler = SearchHandler([PubMedTool(), WebTool()])
+    result = await handler.execute("metformin alzheimer")
+    print(f"Found {result.total_found} results")
+    for e in result.evidence[:3]:
+        print(f"- {e.citation.title}")
+asyncio.run(test())
+```
+**Proceed to Phase 3 ONLY after all checkboxes are complete.**

docs/implementation/03_phase_judge.md ADDED Viewed

	@@ -0,0 +1,93 @@

+# Phase 3 Implementation Spec: Judge Vertical Slice
+**Goal**: Implement the "Brain" of the agent — evaluating evidence quality.
+**Philosophy**: "Structured Output or Bust."
+---
+## 1. The Slice Definition
+This slice covers:
+1.  **Input**: A user question + a list of `Evidence` (from Phase 2).
+2.  **Process**:
+    - Construct a prompt with the evidence.
+    - Call LLM (PydanticAI / OpenAI / Anthropic).
+    - Force JSON structured output.
+3.  **Output**: A `JudgeAssessment` object.
+**Directory**: `src/features/judge/`
+---
+## 2. Models (`src/features/judge/models.py`)
+The output schema must be strict.
+```python
+from pydantic import BaseModel, Field
+from typing import List, Literal
+class AssessmentDetails(BaseModel):
+    mechanism_score: int = Field(..., ge=0, le=10)
+    mechanism_reasoning: str
+    candidates_found: List[str]
+class JudgeAssessment(BaseModel):
+    details: AssessmentDetails
+    sufficient: bool
+    recommendation: Literal["continue", "synthesize"]
+    next_search_queries: List[str]
+```
+---
+## 3. Prompt Engineering (`src/features/judge/prompts.py`)
+We treat prompts as code. They should be versioned and clean.
+```python
+SYSTEM_PROMPT = """You are a drug repurposing research judge.
+Evaluate the evidence strictly.
+Output JSON only."""
+def format_user_prompt(question: str, evidence: List[Evidence]) -> str:
+    # ... formatting logic ...
+    return prompt
+```
+---
+## 4. TDD Workflow
+### Step 1: Mocked LLM Test
+We do NOT hit the real LLM in unit tests. We mock the response to ensure our parsing logic works.
+Create `tests/unit/features/judge/test_handler.py`.
+```python
+@pytest.mark.asyncio
+async def test_judge_parsing(mocker):
+    # Arrange
+    mock_llm_response = '{"sufficient": true, ...}'
+    mocker.patch("llm_client.generate", return_value=mock_llm_response)
+    # Act
+    handler = JudgeHandler()
+    assessment = await handler.assess("q", [])
+    # Assert
+    assert assessment.sufficient is True
+```
+### Step 2: Implement Handler
+Use `pydantic-ai` or a raw client to enforce the schema.
+---
+## 5. Implementation Checklist
+- [ ] Define `JudgeAssessment` models.
+- [ ] Write Prompt Templates.
+- [ ] Implement `JudgeHandler` with PydanticAI/Instructor pattern.
+- [ ] Write tests ensuring JSON parsing handles failures gracefully (retry logic).
+- [ ] Verify via `uv run pytest`.

docs/implementation/04_phase_ui.md ADDED Viewed

	@@ -0,0 +1,84 @@

+# Phase 4 Implementation Spec: Orchestrator & UI
+**Goal**: Connect the Brain and the Body, then give it a Face.
+**Philosophy**: "Streaming is Trust."
+---
+## 1. The Slice Definition
+This slice connects:
+1.  **Orchestrator**: The state machine (While loop) calling Search -> Judge.
+2.  **UI**: Gradio interface that visualizes the loop.
+**Directory**: `src/features/orchestrator/` and `src/app.py`
+---
+## 2. The Orchestrator Logic
+This is the "Agent" logic.
+```python
+class Orchestrator:
+    def __init__(self, search_handler, judge_handler):
+        self.search = search_handler
+        self.judge = judge_handler
+        self.history = []
+    async def run_generator(self, query: str):
+        """Yields events for the UI"""
+        yield AgentEvent("Searching...")
+        evidence = await self.search.execute(query)
+        yield AgentEvent("Judging...")
+        assessment = await self.judge.assess(query, evidence)
+        if assessment.sufficient:
+            yield AgentEvent("Complete", data=assessment)
+        else:
+            yield AgentEvent("Looping...", data=assessment.next_queries)
+```
+---
+## 3. The UI (Gradio)
+We use **Gradio 5** generator pattern for real-time feedback.
+```python
+import gradio as gr
+async def interact(message, history):
+    agent = Orchestrator(...)
+    async for event in agent.run_generator(message):
+        yield f"**{event.step}**: {event.details}"
+demo = gr.ChatInterface(fn=interact, type="messages")
+```
+---
+## 4. TDD Workflow
+### Step 1: Test the State Machine
+Test the loop logic without UI.
+```python
+@pytest.mark.asyncio
+async def test_orchestrator_loop_limit():
+    # Configure judge to always return "sufficient=False"
+    # Assert loop stops at MAX_ITERATIONS
+```
+### Step 2: Build UI
+Run `uv run python src/app.py` and verify locally.
+---
+## 5. Implementation Checklist
+- [ ] Implement `Orchestrator` class.
+- [ ] Write loop logic with max_iterations safety.
+- [ ] Create `src/app.py` with Gradio.
+- [ ] Add "Deployment" configuration (Dockerfile/Spaces config).

docs/implementation/roadmap.md ADDED Viewed

	@@ -0,0 +1,94 @@

+# Implementation Roadmap: DeepCritical (Vertical Slices)
+**Philosophy:** AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).
+This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.
+---
+## 🛠️ The 2025 "Gucci" Tooling Stack
+We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.
+| Category | Tool | Why? |
+|----------|------|------|
+| **Package Manager** | **`uv`** | Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. |
+| **Linting/Format** | **`ruff`** | Rust-based, instant. Replaces black, isort, flake8. |
+| **Type Checking** | **`mypy`** | Strict static typing. Run via `uv run mypy`. |
+| **Testing** | **`pytest`** | The standard. |
+| **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
+| **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
+| **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
+| **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |
+---
+## 🏗️ Architecture: Vertical Slices
+Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
+Each slice implements a feature from **Entry Point (UI/API) -> Logic -> Data/External**.
+### Directory Structure (Feature-First)
+```
+src/
+├── app.py                  # Entry point
+├── shared/                 # Shared utilities (logging, config, base classes)
+│   ├── config.py
+│   └── observability.py
+└── features/               # Vertical Slices
+    ├── search/             # Slice: Executing Searches
+    │   ├── handlers.py
+    │   ├── tools.py
+    │   └── models.py
+    ├── judge/              # Slice: Assessing Quality
+    │   ├── handlers.py
+    │   ├── prompts.py
+    │   └── models.py
+    └── report/             # Slice: Synthesizing Output
+        ├── handlers.py
+        └── models.py
+```
+---
+## 🚀 Phased Execution Plan
+### **Phase 1: Foundation & Tooling (Day 1)**
+*Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*
+- [ ] Initialize `pyproject.toml` with `uv`.
+- [ ] Configure `ruff` (strict) and `mypy` (strict).
+- [ ] Set up `pytest` with sugar and coverage.
+- [ ] Implement `shared/config.py` (Configuration Slice).
+- **Deliverable**: A repo that passes CI with `uv run pytest`.
+### **Phase 2: The "Search" Vertical Slice (Day 2)**
+*Goal: Agent can receive a query and get raw results from PubMed/Web.*
+- [ ] **TDD**: Write test for `SearchHandler`.
+- [ ] Implement `features/search/tools.py` (PubMed + DuckDuckGo).
+- [ ] Implement `features/search/handlers.py` (Orchestrates tools).
+- **Deliverable**: Function that takes "long covid" -> returns `List[Evidence]`.
+### **Phase 3: The "Judge" Vertical Slice (Day 3)**
+*Goal: Agent can decide if evidence is sufficient.*
+- [ ] **TDD**: Write test for `JudgeHandler` (Mocked LLM).
+- [ ] Implement `features/judge/prompts.py` (Structured outputs).
+- [ ] Implement `features/judge/handlers.py` (LLM interaction).
+- **Deliverable**: Function that takes `List[Evidence]` -> returns `JudgeAssessment`.
+### **Phase 4: The "Loop" & UI Slice (Day 4)**
+*Goal: End-to-End User Value.*
+- [ ] Implement the `Orchestrator` (Connects Search + Judge loops).
+- [ ] Build `features/ui/` (Gradio with Streaming).
+- **Deliverable**: Working DeepCritical Agent on HuggingFace.
+---
+## 📜 Spec Documents
+1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)**
+2. **[Phase 2 Spec: Search Slice](02_phase_search.md)**
+3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)**
+4. **[Phase 4 Spec: UI & Loop](04_phase_ui.md)**
+*Start by reading Phase 1 Spec to initialize the repo.*