Spaces:

DataQuests
/

DeepCritical

Running

VibecoderMcSwaggins commited on 12 days ago

Commit

cfb473d

1 Parent(s): 0efdc2f

refactor(examples): purge all mocks - real API calls only

NO MOCKS. NO FAKE DATA. REAL SCIENCE.

Changes:
- hypothesis_demo: Now does REAL search before hypothesis generation
- full_stack_demo: Removed run_mock_demo(), create_mock_*() functions
- orchestrator_demo: Removed --mock flag and MockJudgeHandler
- README: Updated to reflect "Real or Nothing" philosophy

All examples now require API keys and make real API calls.
Mocks belong in tests/unit/, not in demos.

Files changed (4) hide show

examples/README.md +98 -96
examples/full_stack_demo/run_full.py +74 -316
examples/hypothesis_demo/run_hypothesis.py +74 -166
examples/orchestrator_demo/run_agent.py +57 -33

examples/README.md CHANGED Viewed

@@ -1,181 +1,183 @@
 # DeepCritical Examples
-Demo scripts demonstrating each phase of the drug repurposing research agent.
-## Quick Start
 ```bash
-# Run without API keys (mock modes available)
-uv run python examples/embeddings_demo/run_embeddings.py
-uv run python examples/full_stack_demo/run_full.py --mock
-# Run with API keys (set OPENAI_API_KEY or ANTHROPIC_API_KEY)
-uv run python examples/full_stack_demo/run_full.py "metformin cancer"
 ```
 ---
-## 1. Search Demo (Phase 2)
-Demonstrates parallel search across PubMed and Web sources. **No API keys required.**
 ```bash
 uv run python examples/search_demo/run_search.py "metformin cancer"
 ```
-**What it shows:**
-- PubMed E-utilities search
-- DuckDuckGo web search
-- Scatter-gather orchestration
-- Evidence model with citations
 ---
-## 2. Agent Demo (Phase 4)
-Demonstrates the search-judge-synthesize loop.
-**Mock Mode (No API Keys):**
 ```bash
-uv run python examples/orchestrator_demo/run_agent.py "metformin cancer" --mock
-```
-**Real Mode (Requires API Keys):**
-```bash
-uv run python examples/orchestrator_demo/run_agent.py "metformin cancer"
 ```
-**What it shows:**
-- Iterative search refinement
-- LLM-based evidence assessment
-- Synthesis generation
-- Event streaming for UI updates
 ---
-## 3. Magentic Demo (Phase 5)
-Demonstrates multi-agent coordination using Microsoft Agent Framework.
 ```bash
-# Requires OPENAI_API_KEY (Magentic uses OpenAI)
-uv run python examples/orchestrator_demo/run_magentic.py "metformin cancer"
 ```
-**What it shows:**
-- MagenticBuilder workflow
-- SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent coordination
-- Manager-based orchestration
 ---
-## 4. Embeddings Demo (Phase 6)
-Demonstrates semantic search and deduplication. **No API keys required.**
 ```bash
-uv run python examples/embeddings_demo/run_embeddings.py
 ```
-**What it shows:**
-- Text embedding with sentence-transformers
-- ChromaDB vector storage
-- Semantic similarity search
-- Duplicate detection by meaning (not just URL)
-- Cosine similarity calculations
 ---
-## 5. Hypothesis Demo (Phase 7)
-Demonstrates mechanistic hypothesis generation.
 ```bash
-# Requires OPENAI_API_KEY or ANTHROPIC_API_KEY
 uv run python examples/hypothesis_demo/run_hypothesis.py "metformin Alzheimer's"
 uv run python examples/hypothesis_demo/run_hypothesis.py "sildenafil heart failure"
 ```
-**What it shows:**
-- Drug -> Target -> Pathway -> Effect reasoning
-- Knowledge gap identification
-- Search query suggestions for targeted research
-- Confidence scoring
 ---
-## 6. Full Stack Demo (Phases 1-8)
-**The complete pipeline** - demonstrates all phases working together.
-**Mock Mode (No API Keys):**
-```bash
-uv run python examples/full_stack_demo/run_full.py --mock
-```
-**Real Mode:**
 ```bash
 uv run python examples/full_stack_demo/run_full.py "metformin Alzheimer's"
 uv run python examples/full_stack_demo/run_full.py "sildenafil heart failure" -i 3
 ```
-**What it shows:**
-1. **Search** - PubMed + Web evidence collection
-2. **Embeddings** - Semantic deduplication
-3. **Hypothesis** - Mechanistic reasoning
-4. **Judge** - Evidence quality assessment
-5. **Report** - Structured scientific report generation
-Output includes a publication-quality research report with:
-- Executive summary
-- Methodology
-- Hypotheses tested (with support/contradict counts)
-- Mechanistic and clinical findings
-- Drug candidates
-- Limitations
-- Formatted references
 ---
-## API Keys
-| Example | Required Keys |
-|---------|--------------|
-| search_demo | None (optional NCBI_API_KEY for higher rate limits) |
-| orchestrator_demo --mock | None |
-| orchestrator_demo | OPENAI_API_KEY or ANTHROPIC_API_KEY |
-| run_magentic | OPENAI_API_KEY |
-| embeddings_demo | None |
-| hypothesis_demo | OPENAI_API_KEY or ANTHROPIC_API_KEY |
-| full_stack_demo --mock | None |
-| full_stack_demo | OPENAI_API_KEY or ANTHROPIC_API_KEY |
 ---
-## Architecture Overview
 ```
 User Query
     |
     v
-[Phase 2: Search] --> PubMed + Web
     |
     v
-[Phase 6: Embeddings] --> Semantic Deduplication
     |
     v
-[Phase 7: Hypothesis] --> Drug -> Target -> Pathway -> Effect
     |
     v
-[Phase 3: Judge] --> "Is evidence sufficient?"
     |
-    +---> NO --> Refine queries, loop back to Search
     |
-    +---> YES --> Continue to Report
     |
     v
-[Phase 8: Report] --> Structured Scientific Report
     |
     v
-Final Output with Citations
 ```

 # DeepCritical Examples
+**NO MOCKS. NO FAKE DATA. REAL SCIENCE.**
+These demos run the REAL drug repurposing research pipeline with actual API calls.
+---
+## Prerequisites
+You MUST have API keys configured:
 ```bash
+# Copy the example and add your keys
+cp .env.example .env
+# Required (pick one):
+OPENAI_API_KEY=sk-...
+ANTHROPIC_API_KEY=sk-ant-...
+# Optional (higher PubMed rate limits):
+NCBI_API_KEY=your-key
 ```
 ---
+## Examples
+### 1. Search Demo (No LLM Required)
+Demonstrates REAL parallel search across PubMed and Web.
 ```bash
 uv run python examples/search_demo/run_search.py "metformin cancer"
 ```
+**What's REAL:**
+- Actual NCBI E-utilities API calls
+- Actual DuckDuckGo web searches
+- Real papers, real URLs, real content
 ---
+### 2. Embeddings Demo (No LLM Required)
+Demonstrates REAL semantic search and deduplication.
 ```bash
+uv run python examples/embeddings_demo/run_embeddings.py
 ```
+**What's REAL:**
+- Actual sentence-transformers model (all-MiniLM-L6-v2)
+- Actual ChromaDB vector storage
+- Real cosine similarity computations
+- Real semantic deduplication
 ---
+### 3. Orchestrator Demo (LLM Required)
+Demonstrates the REAL search-judge-synthesize loop.
 ```bash
+uv run python examples/orchestrator_demo/run_agent.py "metformin cancer"
+uv run python examples/orchestrator_demo/run_agent.py "aspirin alzheimer" --iterations 5
 ```
+**What's REAL:**
+- Real PubMed + Web searches
+- Real LLM judge evaluating evidence quality
+- Real iterative refinement based on LLM decisions
+- Real research synthesis
 ---
+### 4. Magentic Demo (OpenAI Required)
+Demonstrates REAL multi-agent coordination using Microsoft Agent Framework.
 ```bash
+# Requires OPENAI_API_KEY specifically
+uv run python examples/orchestrator_demo/run_magentic.py "metformin cancer"
 ```
+**What's REAL:**
+- Real MagenticBuilder orchestration
+- Real SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent
+- Real manager-based coordination
 ---
+### 5. Hypothesis Demo (LLM Required)
+Demonstrates REAL mechanistic hypothesis generation.
 ```bash
 uv run python examples/hypothesis_demo/run_hypothesis.py "metformin Alzheimer's"
 uv run python examples/hypothesis_demo/run_hypothesis.py "sildenafil heart failure"
 ```
+**What's REAL:**
+- Real PubMed + Web search first
+- Real embedding-based deduplication
+- Real LLM generating Drug -> Target -> Pathway -> Effect chains
+- Real knowledge gap identification
 ---
+### 6. Full Stack Demo (LLM Required)
+**THE COMPLETE PIPELINE** - All phases working together.
 ```bash
 uv run python examples/full_stack_demo/run_full.py "metformin Alzheimer's"
 uv run python examples/full_stack_demo/run_full.py "sildenafil heart failure" -i 3
 ```
+**What's REAL:**
+1. Real PubMed + Web evidence collection
+2. Real embedding-based semantic deduplication
+3. Real LLM mechanistic hypothesis generation
+4. Real LLM evidence quality assessment
+5. Real LLM structured scientific report generation
+Output: Publication-quality research report with validated citations.
 ---
+## API Key Requirements
+| Example | LLM Required | Keys |
+|---------|--------------|------|
+| search_demo | No | Optional: `NCBI_API_KEY` |
+| embeddings_demo | No | None |
+| orchestrator_demo | Yes | `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` |
+| run_magentic | Yes | `OPENAI_API_KEY` (Magentic requires OpenAI) |
+| hypothesis_demo | Yes | `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` |
+| full_stack_demo | Yes | `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` |
 ---
+## Architecture
 ```
 User Query
     |
     v
+[REAL Search] --> Actual PubMed + Web API calls
     |
     v
+[REAL Embeddings] --> Actual sentence-transformers
     |
     v
+[REAL Hypothesis] --> Actual LLM reasoning
     |
     v
+[REAL Judge] --> Actual LLM assessment
     |
+    +---> Need more? --> Loop back to Search
     |
+    +---> Sufficient --> Continue
     |
     v
+[REAL Report] --> Actual LLM synthesis
     |
     v
+Publication-Quality Research Report
 ```
+---
+## Why No Mocks?
+> "Authenticity is the feature."
+Mocks belong in `tests/unit/`, not in demos. When you run these examples, you see:
+- Real papers from real databases
+- Real AI reasoning about real evidence
+- Real scientific hypotheses
+- Real research reports
+This is what DeepCritical actually does. No fake data. No canned responses.

examples/full_stack_demo/run_full.py CHANGED Viewed

@@ -2,22 +2,20 @@
 """
 Demo: Full Stack DeepCritical Agent (Phases 1-8).
-This script demonstrates the COMPLETE drug repurposing research pipeline:
-- Phase 2: Search (PubMed + Web)
-- Phase 6: Embeddings (Semantic deduplication)
-- Phase 7: Hypothesis (Mechanistic reasoning)
-- Phase 3: Judge (Evidence assessment)
-- Phase 8: Report (Structured scientific report)
 Usage:
-    # Full demo with real searches and LLM (requires API keys)
     uv run python examples/full_stack_demo/run_full.py "metformin Alzheimer's"
-    # Mock mode - demonstrates pipeline without API calls
-    uv run python examples/full_stack_demo/run_full.py --mock
-    # With specific iterations
-    uv run python examples/full_stack_demo/run_full.py "sildenafil heart failure" --iterations 2
 """
 import argparse
@@ -26,7 +24,7 @@ import os
 import sys
 from typing import Any
-from src.utils.models import Citation, Evidence, MechanismHypothesis
 def print_header(title: str) -> None:
@@ -42,264 +40,15 @@ def print_step(step: int, name: str) -> None:
     print("-" * 50)
-def create_mock_evidence() -> list[Evidence]:
-    """Create comprehensive mock evidence for demo without API calls."""
-    return [
-        Evidence(
-            content=(
-                "Metformin, a first-line treatment for type 2 diabetes, activates "
-                "AMP-activated protein kinase (AMPK). AMPK is a master metabolic "
-                "regulator that inhibits mTOR signaling, reducing protein synthesis "
-                "and cell proliferation. This mechanism has implications beyond "
-                "glucose control."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="Metformin activates AMPK through LKB1-dependent mechanisms",
-                url="https://pubmed.ncbi.nlm.nih.gov/19001324/",
-                date="2023-06",
-                authors=["Zhang L", "Wang H", "Chen Y"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "In transgenic mouse models of Alzheimer's disease, metformin treatment "
-                "reduced tau phosphorylation by 45% and decreased amyloid-beta plaque "
-                "formation. Treated mice showed improved performance on Morris water "
-                "maze tests, suggesting preserved spatial memory."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="Metformin ameliorates tau pathology in AD mouse models",
-                url="https://pubmed.ncbi.nlm.nih.gov/31256789/",
-                date="2024-01",
-                authors=["Kim J", "Lee S", "Park M", "Tanaka K"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "A population-based cohort study of 100,000 diabetic patients found "
-                "that metformin users had 35% lower risk of developing Alzheimer's "
-                "disease compared to sulfonylurea users (HR=0.65, 95% CI: 0.58-0.73). "
-                "The protective effect increased with duration of use."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="Metformin and dementia risk: UK Biobank analysis",
-                url="https://pubmed.ncbi.nlm.nih.gov/34567890/",
-                date="2023-09",
-                authors=["Smith A", "Johnson B", "Williams C"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "mTOR hyperactivation is observed in Alzheimer's disease brain tissue. "
-                "mTOR inhibition by rapamycin or metformin promotes autophagy, which "
-                "clears misfolded proteins including tau and amyloid-beta aggregates. "
-                "This suggests a common therapeutic pathway."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="mTOR-autophagy axis in neurodegeneration",
-                url="https://pubmed.ncbi.nlm.nih.gov/32109876/",
-                date="2023-03",
-                authors=["Brown C", "Davis D", "Miller E"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "Metformin crosses the blood-brain barrier via organic cation "
-                "transporters (OCT1, OCT2). CSF concentrations reach approximately "
-                "1-2% of plasma levels, sufficient for AMPK activation in neurons. "
-                "Brain accumulation is observed in hippocampus and prefrontal cortex."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="Brain pharmacokinetics of metformin in humans",
-                url="https://pubmed.ncbi.nlm.nih.gov/35678901/",
-                date="2024-02",
-                authors=["Wilson E", "Garcia F"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "Phase 2 clinical trial (NCT04098666) showed metformin 2000mg/day "
-                "for 12 months slowed cognitive decline by 18% compared to placebo "
-                "in patients with mild cognitive impairment. Biomarker analysis "
-                "showed reduced CSF tau levels in the treatment group."
-            ),
-            citation=Citation(
-                source="web",
-                title="Metformin for Alzheimer's prevention trial results",
-                url="https://clinicaltrials.gov/ct2/show/NCT04098666",
-                date="2024-03",
-                authors=["NIH Clinical Center"],
-            ),
-        ),
-    ]
-def create_mock_hypotheses() -> list[MechanismHypothesis]:
-    """Create mock hypotheses for demonstration."""
-    return [
-        MechanismHypothesis(
-            drug="Metformin",
-            target="AMPK",
-            pathway="mTOR inhibition -> Autophagy activation",
-            effect="Clearance of tau and amyloid-beta aggregates",
-            confidence=0.85,
-            supporting_evidence=[
-                "https://pubmed.ncbi.nlm.nih.gov/19001324/",
-                "https://pubmed.ncbi.nlm.nih.gov/32109876/",
-            ],
-            contradicting_evidence=[],
-            search_suggestions=[
-                "AMPK autophagy neurodegeneration",
-                "metformin tau clearance",
-            ],
-        ),
-        MechanismHypothesis(
-            drug="Metformin",
-            target="Glucose metabolism",
-            pathway="Improved neuronal energy homeostasis",
-            effect="Reduced oxidative stress and neuroinflammation",
-            confidence=0.70,
-            supporting_evidence=["https://pubmed.ncbi.nlm.nih.gov/31256789/"],
-            contradicting_evidence=[],
-            search_suggestions=[
-                "metformin brain glucose metabolism",
-                "neuronal insulin resistance alzheimer",
-            ],
-        ),
-    ]
-async def run_mock_demo() -> None:
-    """Run full pipeline with mock data (no API keys needed)."""
-    print_header("DeepCritical Full Stack Demo (MOCK MODE)")
-    print("Running with synthetic data - no API keys required.\n")
-    evidence = create_mock_evidence()
-    hypotheses = create_mock_hypotheses()
-    # Step 1: Show evidence
-    print_step(1, "SEARCH (Phase 2) - Evidence Collection")
-    print(f"Collected {len(evidence)} pieces of evidence:\n")
-    for i, e in enumerate(evidence, 1):
-        print(f"  [{i}] {e.citation.source.upper()}: {e.citation.title[:50]}...")
-        print(f"      {e.content[:80]}...")
-        print()
-    # Step 2: Embedding deduplication
-    print_step(2, "EMBEDDINGS (Phase 6) - Semantic Deduplication")
-    try:
-        from src.services.embeddings import EmbeddingService
-        service = EmbeddingService()
-        unique = await service.deduplicate(evidence, threshold=0.85)
-        print(f"Original: {len(evidence)} papers")
-        print(f"After deduplication: {len(unique)} unique papers")
-        print("(Semantic duplicates removed by meaning, not just URL)")
-    except ImportError:
-        print("Embedding dependencies not installed - skipping deduplication")
-        unique = evidence
-    # Step 3: Hypothesis generation
-    print_step(3, "HYPOTHESIS (Phase 7) - Mechanistic Reasoning")
-    print(f"Generated {len(hypotheses)} hypotheses:\n")
-    for i, h in enumerate(hypotheses, 1):
-        print(f"  Hypothesis {i} (Confidence: {h.confidence:.0%})")
-        print(f"  {h.drug} -> {h.target} -> {h.pathway} -> {h.effect}")
-        print(f"  Suggested searches: {', '.join(h.search_suggestions)}")
-        print()
-    # Step 4: Judge assessment
-    print_step(4, "JUDGE (Phase 3) - Evidence Assessment")
-    print("Assessment Results:")
-    print("  Mechanism Score:  8/10 (Strong mechanistic evidence)")
-    print("  Clinical Score:   7/10 (Phase 2 trial + observational data)")
-    print("  Confidence:       75%")
-    print("  Recommendation:   SYNTHESIZE (Evidence sufficient)")
-    print()
-    # Step 5: Report generation
-    print_step(5, "REPORT (Phase 8) - Structured Scientific Report")
-    report = f"""
-# Drug Repurposing Analysis: Metformin for Alzheimer's Disease
-## Executive Summary
-This analysis evaluated metformin as a potential therapeutic for Alzheimer's
-disease. Evidence from {len(unique)} sources supports a plausible mechanism
-through AMPK activation and mTOR inhibition, leading to enhanced autophagy
-and clearance of pathological protein aggregates. Clinical data shows
-promising risk reduction in observational studies and early trial results.
-## Research Question
-Can metformin, a type 2 diabetes medication, be repurposed for the prevention
-or treatment of Alzheimer's disease?
-## Methodology
-- Searched PubMed and web sources for "metformin Alzheimer's disease"
-- Applied semantic deduplication to remove redundant findings
-- Generated mechanistic hypotheses using LLM reasoning
-- Evaluated evidence quality with structured assessment
-## Hypotheses Tested
-- **Metformin -> AMPK -> mTOR inhibition -> Neuroprotection** (SUPPORTED)
-  - 4 supporting papers, 0 contradicting
-- **Metformin -> Glucose metabolism -> Reduced oxidative stress** (PARTIAL)
-  - 2 supporting papers, requires more investigation
-## Mechanistic Findings
-Strong evidence supports AMPK activation as the primary mechanism. Metformin
-crosses the blood-brain barrier and achieves therapeutic concentrations in
-hippocampus and cortex. Downstream effects include:
-- mTOR inhibition
-- Autophagy activation
-- Tau dephosphorylation
-- Amyloid-beta clearance
-## Clinical Findings
-- Observational: 35% risk reduction (HR=0.65, n=100,000)
-- Preclinical: 45% reduction in tau phosphorylation in AD mice
-- Phase 2 trial: 18% slower cognitive decline vs placebo
-## Drug Candidates
-- **Metformin** - Primary candidate with established safety profile
-## Limitations
-- Abstract-level analysis only
-- Observational data subject to confounding
-- Limited RCT data available
-- Optimal dosing for neuroprotection unclear
-## Conclusion
-Metformin shows strong potential for Alzheimer's disease prevention/treatment.
-The AMPK-mTOR-autophagy mechanism is well-supported. Recommend Phase 3 trials
-with cognitive endpoints.
-## References
-"""
-    max_authors_display = 2
-    for i, e in enumerate(unique[:6], 1):
-        authors = ", ".join(e.citation.authors[:max_authors_display])
-        if len(e.citation.authors) > max_authors_display:
-            authors += " et al."
-        ref_line = (
-            f"{i}. {authors}. *{e.citation.title}*. "
-            f"{e.citation.source.upper()} ({e.citation.date}). "
-            f"[Link]({e.citation.url})"
-        )
-        report += ref_line + "\n"
-    report += f"""
----
-*Report generated from {len(unique)} papers across 3 search iterations.
-Confidence: 75%*
-"""
-    print(report)
 async def _run_search_iteration(
@@ -328,12 +77,12 @@ async def _run_search_iteration(
     return all_evidence
-async def run_real_demo(query: str, max_iterations: int) -> None:
-    """Run full pipeline with real API calls."""
-    print_header("DeepCritical Full Stack Demo")
     print(f"Query: {query}")
     print(f"Max iterations: {max_iterations}")
-    print("Mode: REAL (Live API calls)\n")
     # Import real components
     from src.agent_factory.judges import JudgeHandler
@@ -344,7 +93,8 @@ async def run_real_demo(query: str, max_iterations: int) -> None:
     from src.tools.search_handler import SearchHandler
     from src.tools.websearch import WebTool
-    # Initialize services
     embedding_service = EmbeddingService()
     search_handler = SearchHandler(tools=[PubMedTool(), WebTool()], timeout=30.0)
     judge_handler = JudgeHandler()
@@ -356,42 +106,47 @@ async def run_real_demo(query: str, max_iterations: int) -> None:
     for iteration in range(1, max_iterations + 1):
         print_step(iteration, f"ITERATION {iteration}/{max_iterations}")
-        # Step 1: Search
-        print("\n[Search] Querying PubMed and Web...")
         all_evidence = await _run_search_iteration(
             query, iteration, evidence_store, all_evidence, search_handler, embedding_service
         )
-        # Step 2: Generate hypotheses (first iteration only)
         if iteration == 1:
-            print("\n[Hypothesis] Generating mechanistic hypotheses...")
             hypothesis_agent = HypothesisAgent(evidence_store, embedding_service)
             hyp_response = await hypothesis_agent.run(query)
-            print(hyp_response.messages[0].text[:500] + "...")
-        # Step 3: Judge
-        print("\n[Judge] Assessing evidence quality...")
         assessment = await judge_handler.assess(query, all_evidence)
-        print(f"  Mechanism: {assessment.details.mechanism_score}/10")
-        print(f"  Clinical: {assessment.details.clinical_evidence_score}/10")
-        print(f"  Recommendation: {assessment.recommendation}")
         if assessment.recommendation == "synthesize":
-            print("\n[Judge says] Evidence sufficient! Generating report...")
             evidence_store["last_assessment"] = assessment.details.model_dump()
             break
         next_queries = assessment.next_search_queries[:2]
-        print(f"\n[Judge says] Need more evidence. Next queries: {next_queries}")
         query = assessment.next_search_queries[0] if assessment.next_search_queries else query
-    # Step 4: Generate report
-    print_step(iteration + 1, "REPORT GENERATION")
     report_agent = ReportAgent(evidence_store, embedding_service)
     report_response = await report_agent.run(query)
     print("\n" + "=" * 70)
-    print("FINAL RESEARCH REPORT")
     print("=" * 70)
     print(report_response.messages[0].text)
@@ -399,30 +154,25 @@ async def run_real_demo(query: str, max_iterations: int) -> None:
 async def main() -> None:
     """Entry point."""
     parser = argparse.ArgumentParser(
-        description="DeepCritical Full Stack Demo (Phases 1-8)",
         formatter_class=argparse.RawDescriptionHelpFormatter,
         epilog="""
-Examples:
-    # Mock mode (no API keys)
-    uv run python examples/full_stack_demo/run_full.py --mock
-    # Real mode with metformin query
-    uv run python examples/full_stack_demo/run_full.py "metformin alzheimer"
-    # Sildenafil for heart failure
     uv run python examples/full_stack_demo/run_full.py "sildenafil heart failure" -i 3
         """,
     )
     parser.add_argument(
         "query",
-        nargs="?",
-        default="metformin Alzheimer's disease",
-        help="Research query",
-    )
-    parser.add_argument(
-        "--mock",
-        action="store_true",
-        help="Run with mock data (no API keys needed)",
     )
     parser.add_argument(
         "-i",
@@ -434,21 +184,29 @@ Examples:
     args = parser.parse_args()
-    if args.mock:
-        await run_mock_demo()
-    else:
-        # Check for API keys
-        if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
-            print("Error: Real mode requires OPENAI_API_KEY or ANTHROPIC_API_KEY")
-            print("Use --mock for demo without API keys.")
-            sys.exit(1)
-        await run_real_demo(args.query, args.iterations)
     print("\n" + "=" * 70)
     print("  DeepCritical Full Stack Demo Complete!")
-    print("  Phases demonstrated: Foundation -> Search -> Judge -> UI ->")
-    print("                       Magentic -> Embeddings -> Hypothesis -> Report")
     print("=" * 70 + "\n")

 """
 Demo: Full Stack DeepCritical Agent (Phases 1-8).
+This script demonstrates the COMPLETE REAL drug repurposing research pipeline:
+- Phase 2: REAL Search (PubMed + Web API calls)
+- Phase 6: REAL Embeddings (sentence-transformers + ChromaDB)
+- Phase 7: REAL Hypothesis (LLM mechanistic reasoning)
+- Phase 3: REAL Judge (LLM evidence assessment)
+- Phase 8: REAL Report (LLM structured scientific report)
+NO MOCKS. NO FAKE DATA. REAL SCIENCE.
 Usage:
     uv run python examples/full_stack_demo/run_full.py "metformin Alzheimer's"
+    uv run python examples/full_stack_demo/run_full.py "sildenafil heart failure" -i 3
+Requires: OPENAI_API_KEY or ANTHROPIC_API_KEY
 """
 import argparse
 import sys
 from typing import Any
+from src.utils.models import Evidence
 def print_header(title: str) -> None:
     print("-" * 50)
+_MAX_DISPLAY_LEN = 600
+def _print_truncated(text: str) -> None:
+    """Print text, truncating if too long."""
+    if len(text) > _MAX_DISPLAY_LEN:
+        print(text[:_MAX_DISPLAY_LEN] + "\n... [truncated for display]")
+    else:
+        print(text)
 async def _run_search_iteration(
     return all_evidence
+async def run_full_demo(query: str, max_iterations: int) -> None:
+    """Run the REAL full stack pipeline."""
+    print_header("DeepCritical Full Stack Demo (REAL)")
     print(f"Query: {query}")
     print(f"Max iterations: {max_iterations}")
+    print("Mode: REAL (All live API calls - no mocks)\n")
     # Import real components
     from src.agent_factory.judges import JudgeHandler
     from src.tools.search_handler import SearchHandler
     from src.tools.websearch import WebTool
+    # Initialize REAL services
+    print("[Init] Loading embedding model...")
     embedding_service = EmbeddingService()
     search_handler = SearchHandler(tools=[PubMedTool(), WebTool()], timeout=30.0)
     judge_handler = JudgeHandler()
     for iteration in range(1, max_iterations + 1):
         print_step(iteration, f"ITERATION {iteration}/{max_iterations}")
+        # Step 1: REAL Search
+        print("\n[Search] Querying PubMed and Web (REAL API calls)...")
         all_evidence = await _run_search_iteration(
             query, iteration, evidence_store, all_evidence, search_handler, embedding_service
         )
+        if not all_evidence:
+            print("\nNo evidence found. Try a different query.")
+            return
+        # Step 2: REAL Hypothesis generation (first iteration only)
         if iteration == 1:
+            print("\n[Hypothesis] Generating mechanistic hypotheses (REAL LLM)...")
             hypothesis_agent = HypothesisAgent(evidence_store, embedding_service)
             hyp_response = await hypothesis_agent.run(query)
+            _print_truncated(hyp_response.messages[0].text)
+        # Step 3: REAL Judge
+        print("\n[Judge] Assessing evidence quality (REAL LLM)...")
         assessment = await judge_handler.assess(query, all_evidence)
+        print(f"  Mechanism Score: {assessment.details.mechanism_score}/10")
+        print(f"  Clinical Score:  {assessment.details.clinical_evidence_score}/10")
+        print(f"  Confidence:      {assessment.confidence:.0%}")
+        print(f"  Recommendation:  {assessment.recommendation.upper()}")
         if assessment.recommendation == "synthesize":
+            print("\n[Judge] Evidence sufficient! Proceeding to report generation...")
             evidence_store["last_assessment"] = assessment.details.model_dump()
             break
         next_queries = assessment.next_search_queries[:2]
+        print(f"\n[Judge] Need more evidence. Next queries: {next_queries}")
         query = assessment.next_search_queries[0] if assessment.next_search_queries else query
+    # Step 4: REAL Report generation
+    print_step(iteration + 1, "REPORT GENERATION (REAL LLM)")
     report_agent = ReportAgent(evidence_store, embedding_service)
     report_response = await report_agent.run(query)
     print("\n" + "=" * 70)
+    print("  FINAL RESEARCH REPORT")
     print("=" * 70)
     print(report_response.messages[0].text)
 async def main() -> None:
     """Entry point."""
     parser = argparse.ArgumentParser(
+        description="DeepCritical Full Stack Demo - REAL, No Mocks",
         formatter_class=argparse.RawDescriptionHelpFormatter,
         epilog="""
+This demo runs the COMPLETE pipeline with REAL API calls:
+  1. REAL search: Actual PubMed + DuckDuckGo queries
+  2. REAL embeddings: Actual sentence-transformers model
+  3. REAL hypothesis: Actual LLM generating mechanistic chains
+  4. REAL judge: Actual LLM assessing evidence quality
+  5. REAL report: Actual LLM generating structured report
+Examples:
+    uv run python examples/full_stack_demo/run_full.py "metformin Alzheimer's"
     uv run python examples/full_stack_demo/run_full.py "sildenafil heart failure" -i 3
+    uv run python examples/full_stack_demo/run_full.py "aspirin cancer prevention"
         """,
     )
     parser.add_argument(
         "query",
+        help="Research query (e.g., 'metformin Alzheimer's disease')",
     )
     parser.add_argument(
         "-i",
     args = parser.parse_args()
+    # Fail fast: require API key
+    if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
+        print("=" * 70)
+        print("ERROR: This demo requires a real LLM.")
+        print()
+        print("Set one of the following in your .env file:")
+        print("  OPENAI_API_KEY=sk-...")
+        print("  ANTHROPIC_API_KEY=sk-ant-...")
+        print()
+        print("This is a REAL demo. No mocks. No fake data.")
+        print("=" * 70)
+        sys.exit(1)
+    await run_full_demo(args.query, args.iterations)
     print("\n" + "=" * 70)
     print("  DeepCritical Full Stack Demo Complete!")
+    print("  ")
+    print("  Everything you just saw was REAL:")
+    print("    - Real PubMed/Web searches")
+    print("    - Real embedding computations")
+    print("    - Real LLM reasoning")
+    print("    - Real scientific report")
     print("=" * 70 + "\n")

examples/hypothesis_demo/run_hypothesis.py CHANGED Viewed

@@ -2,17 +2,15 @@
 """
 Demo: Hypothesis Generation (Phase 7).
-This script demonstrates mechanistic hypothesis generation:
-- Drug -> Target -> Pathway -> Effect reasoning
-- Knowledge gap identification
-- Search query suggestions for targeted research
 Usage:
     # Requires OPENAI_API_KEY or ANTHROPIC_API_KEY
-    uv run python examples/hypothesis_demo/run_hypothesis.py
-    # With custom drug query
-    uv run python examples/hypothesis_demo/run_hypothesis.py "aspirin heart disease"
 """
 import argparse
@@ -22,200 +20,110 @@ import sys
 from typing import Any
 from src.agents.hypothesis_agent import HypothesisAgent
-from src.utils.models import Citation, Evidence
-def create_metformin_evidence() -> list[Evidence]:
-    """Create sample evidence about metformin for hypothesis generation."""
-    return [
-        Evidence(
-            content=(
-                "Metformin activates AMP-activated protein kinase (AMPK), a master regulator "
-                "of cellular energy homeostasis. AMPK activation leads to inhibition of mTOR "
-                "signaling, reducing protein synthesis and cell proliferation."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="Metformin and AMPK: mechanisms of action",
-                url="https://pubmed.ncbi.nlm.nih.gov/12345/",
-                date="2023",
-                authors=["Zhang L", "Wang H"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "In Alzheimer's disease models, AMPK activation by metformin reduced tau "
-                "phosphorylation and amyloid-beta accumulation. These effects correlated "
-                "with improved cognitive function in transgenic mice."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="Metformin neuroprotective effects in AD models",
-                url="https://pubmed.ncbi.nlm.nih.gov/23456/",
-                date="2024",
-                authors=["Kim J", "Lee S", "Park M"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "Clinical observational studies show diabetic patients on metformin have "
-                "30-40% reduced incidence of Alzheimer's disease compared to those on "
-                "other diabetes medications."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="Metformin use and dementia risk: population study",
-                url="https://pubmed.ncbi.nlm.nih.gov/34567/",
-                date="2023",
-                authors=["Smith A", "Johnson B"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "mTOR inhibition has emerged as a key therapeutic target in neurodegenerative "
-                "diseases. Rapamycin and metformin both reduce mTOR activity, though through "
-                "different upstream mechanisms."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="mTOR pathway in neurodegeneration",
-                url="https://pubmed.ncbi.nlm.nih.gov/45678/",
-                date="2022",
-                authors=["Brown C", "Davis D"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "Metformin crosses the blood-brain barrier and accumulates in the hippocampus "
-                "and cortex. Brain concentrations sufficient for AMPK activation are achieved "
-                "at standard diabetic doses."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="Pharmacokinetics of metformin in brain tissue",
-                url="https://pubmed.ncbi.nlm.nih.gov/56789/",
-                date="2023",
-                authors=["Wilson E"],
-            ),
-        ),
-    ]
-def create_sildenafil_evidence() -> list[Evidence]:
-    """Create sample evidence about sildenafil (Viagra) for hypothesis generation."""
-    return [
-        Evidence(
-            content=(
-                "Sildenafil inhibits phosphodiesterase type 5 (PDE5), preventing breakdown "
-                "of cGMP. Elevated cGMP causes smooth muscle relaxation and vasodilation "
-                "in pulmonary vasculature."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="PDE5 inhibition mechanism of sildenafil",
-                url="https://pubmed.ncbi.nlm.nih.gov/67890/",
-                date="2022",
-                authors=["Miller F"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "In pulmonary arterial hypertension (PAH), sildenafil reduces pulmonary "
-                "vascular resistance and improves exercise capacity. FDA approved for PAH "
-                "under brand name Revatio."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="Sildenafil in pulmonary hypertension treatment",
-                url="https://pubmed.ncbi.nlm.nih.gov/78901/",
-                date="2023",
-                authors=["Garcia R", "Martinez L"],
-            ),
-        ),
-        Evidence(
-            content=(
-                "PDE5 is expressed in cardiac myocytes. Sildenafil has shown cardioprotective "
-                "effects in animal models of heart failure by enhancing nitric oxide-cGMP "
-                "signaling in the myocardium."
-            ),
-            citation=Citation(
-                source="pubmed",
-                title="Cardiac effects of PDE5 inhibition",
-                url="https://pubmed.ncbi.nlm.nih.gov/89012/",
-                date="2024",
-                authors=["Thompson K"],
-            ),
-        ),
-    ]
 async def run_hypothesis_demo(query: str) -> None:
-    """Run the hypothesis generation demo."""
     print(f"\n{'='*60}")
     print("DeepCritical Hypothesis Agent Demo (Phase 7)")
     print(f"Query: {query}")
     print(f"{'='*60}\n")
-    # Select appropriate evidence based on query
-    if "sildenafil" in query.lower() or "viagra" in query.lower():
-        evidence = create_sildenafil_evidence()
-        print("Using: Sildenafil evidence set (3 papers)")
-    else:
-        evidence = create_metformin_evidence()
-        print("Using: Metformin evidence set (5 papers)")
-    # Create evidence store (shared context between agents)
-    evidence_store: dict[str, Any] = {"current": evidence, "hypotheses": []}
-    # Create hypothesis agent
-    agent = HypothesisAgent(evidence_store)
-    print("\nGenerating mechanistic hypotheses...\n")
     print("-" * 60)
-    # Run hypothesis generation
     response = await agent.run(query)
-    # Print the formatted response
     print(response.messages[0].text)
     print("-" * 60)
     # Show stored hypotheses
     hypotheses = evidence_store.get("hypotheses", [])
-    print(f"\n{len(hypotheses)} hypotheses stored in evidence_store")
     if hypotheses:
-        print("\nHypothesis search queries generated:")
         for h in hypotheses:
             queries = h.to_search_queries()
-            print(f"  - {h.drug} -> {h.target}: {queries[:2]}")
 async def main() -> None:
-    """Run the demo."""
-    parser = argparse.ArgumentParser(description="Hypothesis Generation Demo")
     parser.add_argument(
         "query",
         nargs="?",
         default="metformin Alzheimer's disease",
-        help="Research query (default: 'metformin Alzheimer\\'s disease')",
     )
     args = parser.parse_args()
-    # Check for API key
     if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
-        print("Error: Hypothesis generation requires an LLM.")
-        print("Set OPENAI_API_KEY or ANTHROPIC_API_KEY in your environment.")
         sys.exit(1)
     await run_hypothesis_demo(args.query)
     print("\n" + "=" * 60)
-    print("Demo complete! The Hypothesis Agent:")
-    print("  - Analyzes evidence to find Drug -> Target -> Pathway -> Effect chains")
-    print("  - Identifies knowledge gaps in current evidence")
-    print("  - Suggests targeted search queries to test hypotheses")
     print("=" * 60 + "\n")

 """
 Demo: Hypothesis Generation (Phase 7).
+This script demonstrates the REAL hypothesis generation pipeline:
+1. REAL search: PubMed + Web (actual API calls)
+2. REAL embeddings: Semantic deduplication
+3. REAL LLM: Mechanistic hypothesis generation
 Usage:
     # Requires OPENAI_API_KEY or ANTHROPIC_API_KEY
+    uv run python examples/hypothesis_demo/run_hypothesis.py "metformin Alzheimer's"
+    uv run python examples/hypothesis_demo/run_hypothesis.py "sildenafil heart failure"
 """
 import argparse
 from typing import Any
 from src.agents.hypothesis_agent import HypothesisAgent
+from src.services.embeddings import EmbeddingService
+from src.tools.pubmed import PubMedTool
+from src.tools.search_handler import SearchHandler
+from src.tools.websearch import WebTool
 async def run_hypothesis_demo(query: str) -> None:
+    """Run the REAL hypothesis generation pipeline."""
     print(f"\n{'='*60}")
     print("DeepCritical Hypothesis Agent Demo (Phase 7)")
     print(f"Query: {query}")
+    print("Mode: REAL (Live API calls)")
     print(f"{'='*60}\n")
+    # Step 1: REAL Search
+    print("[Step 1] Searching PubMed + Web...")
+    search_handler = SearchHandler(tools=[PubMedTool(), WebTool()], timeout=30.0)
+    result = await search_handler.execute(query, max_results_per_tool=5)
+    print(f"  Found {result.total_found} results from {result.sources_searched}")
+    if result.errors:
+        print(f"  Warnings: {result.errors}")
+    if not result.evidence:
+        print("\nNo evidence found. Try a different query.")
+        return
+    # Step 2: REAL Embeddings - Deduplicate
+    print("\n[Step 2] Semantic deduplication...")
+    embedding_service = EmbeddingService()
+    unique_evidence = await embedding_service.deduplicate(result.evidence, threshold=0.85)
+    print(f"  {len(result.evidence)} -> {len(unique_evidence)} unique papers")
+    # Show what we found
+    print("\n[Evidence collected]")
+    max_title_len = 50
+    for i, e in enumerate(unique_evidence[:5], 1):
+        raw_title = e.citation.title
+        title = raw_title[:max_title_len] + "..." if len(raw_title) > max_title_len else raw_title
+        print(f"  {i}. [{e.citation.source.upper()}] {title}")
+    # Step 3: REAL LLM - Generate hypotheses
+    print("\n[Step 3] Generating mechanistic hypotheses (LLM)...")
+    evidence_store: dict[str, Any] = {"current": unique_evidence, "hypotheses": []}
+    agent = HypothesisAgent(evidence_store, embedding_service)
     print("-" * 60)
     response = await agent.run(query)
     print(response.messages[0].text)
     print("-" * 60)
     # Show stored hypotheses
     hypotheses = evidence_store.get("hypotheses", [])
+    print(f"\n{len(hypotheses)} hypotheses stored")
     if hypotheses:
+        print("\nGenerated search queries for further investigation:")
         for h in hypotheses:
             queries = h.to_search_queries()
+            print(f"  {h.drug} -> {h.target}:")
+            for q in queries[:3]:
+                print(f"    - {q}")
 async def main() -> None:
+    """Entry point."""
+    parser = argparse.ArgumentParser(
+        description="Hypothesis Generation Demo (REAL - No Mocks)",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+    uv run python examples/hypothesis_demo/run_hypothesis.py "metformin Alzheimer's"
+    uv run python examples/hypothesis_demo/run_hypothesis.py "sildenafil heart failure"
+    uv run python examples/hypothesis_demo/run_hypothesis.py "aspirin cancer prevention"
+        """,
+    )
     parser.add_argument(
         "query",
         nargs="?",
         default="metformin Alzheimer's disease",
+        help="Research query",
     )
     args = parser.parse_args()
+    # Fail fast: require API key
     if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
+        print("=" * 60)
+        print("ERROR: This demo requires a real LLM.")
+        print()
+        print("Set one of the following in your .env file:")
+        print("  OPENAI_API_KEY=sk-...")
+        print("  ANTHROPIC_API_KEY=sk-ant-...")
+        print()
+        print("This is a REAL demo, not a mock. No fake data.")
+        print("=" * 60)
         sys.exit(1)
     await run_hypothesis_demo(args.query)
     print("\n" + "=" * 60)
+    print("Demo complete! This was a REAL pipeline:")
+    print("  1. REAL search: Actual PubMed + Web API calls")
+    print("  2. REAL embeddings: Actual sentence-transformers")
+    print("  3. REAL LLM: Actual hypothesis generation")
     print("=" * 60 + "\n")

examples/orchestrator_demo/run_agent.py CHANGED Viewed

@@ -1,19 +1,20 @@
 #!/usr/bin/env python3
 """
-Demo: Full DeepCritical Agent Loop (Search + Judge + Orchestrator).
-This script demonstrates Phase 4 functionality:
-- Iterative Search (PubMed + Web)
-- Evidence Evaluation (Judge Agent)
-- Orchestration Loop
-- Final Synthesis
-Usage:
-    # Run with Mock Judge (No API Key needed)
-    uv run python examples/orchestrator_demo/run_agent.py "metformin cancer" --mock
-    # Run with Real Judge (Requires OPENAI_API_KEY or ANTHROPIC_API_KEY)
     uv run python examples/orchestrator_demo/run_agent.py "metformin cancer"
 """
 import argparse
@@ -21,7 +22,7 @@ import asyncio
 import os
 import sys
-from src.agent_factory.judges import JudgeHandler, MockJudgeHandler
 from src.orchestrator import Orchestrator
 from src.tools.pubmed import PubMedTool
 from src.tools.search_handler import SearchHandler
@@ -30,52 +31,75 @@ from src.utils.models import OrchestratorConfig
 async def main() -> None:
-    """Run the agent demo."""
-    parser = argparse.ArgumentParser(description="Run DeepCritical Agent CLI")
     parser.add_argument("query", help="Research query (e.g., 'metformin cancer')")
-    parser.add_argument("--mock", action="store_true", help="Use Mock Judge (no API key needed)")
-    parser.add_argument("--iterations", type=int, default=3, help="Max iterations")
     args = parser.parse_args()
-    # Check for keys if not mocking
-    if not args.mock and not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
-        print("Error: No API key found. Set OPENAI_API_KEY or ANTHROPIC_API_KEY, or use --mock.")
         sys.exit(1)
     print(f"\n{'='*60}")
-    print("DeepCritical Agent Demo")
     print(f"Query: {args.query}")
-    print(f"Mode: {'MOCK' if args.mock else 'REAL (LLM)'}")
-    print(f"{ '='*60}\n")
-    # 1. Setup Search Tools
     search_handler = SearchHandler(tools=[PubMedTool(), WebTool()], timeout=30.0)
-    # 2. Setup Judge
-    judge_handler: JudgeHandler | MockJudgeHandler
-    if args.mock:
-        judge_handler = MockJudgeHandler()
-    else:
-        judge_handler = JudgeHandler()
-    # 3. Setup Orchestrator
     config = OrchestratorConfig(max_iterations=args.iterations)
     orchestrator = Orchestrator(
         search_handler=search_handler, judge_handler=judge_handler, config=config
     )
-    # 4. Run Loop
     try:
         async for event in orchestrator.run(args.query):
-            # Print event with icon
             print(event.to_markdown().replace("**", ""))
-            # If we got data, print a snippet
             if event.type == "search_complete" and event.data:
                 print(f"   -> Found {event.data.get('new_count', 0)} new items")
     except Exception as e:
         print(f"\n❌ Error: {e}")
 if __name__ == "__main__":

 #!/usr/bin/env python3
 """
+Demo: DeepCritical Agent Loop (Search + Judge + Orchestrator).
+This script demonstrates the REAL Phase 4 orchestration:
+- REAL Iterative Search (PubMed + Web API calls)
+- REAL Evidence Evaluation (LLM Judge)
+- REAL Orchestration Loop
+- REAL Final Synthesis
+NO MOCKS. REAL API CALLS.
+Usage:
     uv run python examples/orchestrator_demo/run_agent.py "metformin cancer"
+    uv run python examples/orchestrator_demo/run_agent.py "sildenafil heart failure" --iterations 5
+Requires: OPENAI_API_KEY or ANTHROPIC_API_KEY
 """
 import argparse
 import os
 import sys
+from src.agent_factory.judges import JudgeHandler
 from src.orchestrator import Orchestrator
 from src.tools.pubmed import PubMedTool
 from src.tools.search_handler import SearchHandler
 async def main() -> None:
+    """Run the REAL agent demo."""
+    parser = argparse.ArgumentParser(
+        description="DeepCritical Agent Demo - REAL, No Mocks",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+This demo runs the REAL search-judge-synthesize loop:
+  1. REAL search: Actual PubMed + DuckDuckGo queries
+  2. REAL judge: Actual LLM assessing evidence quality
+  3. REAL loop: Actual iterative refinement based on LLM decisions
+  4. REAL synthesis: Actual research summary generation
+Examples:
+    uv run python examples/orchestrator_demo/run_agent.py "metformin cancer"
+    uv run python examples/orchestrator_demo/run_agent.py "aspirin alzheimer" --iterations 5
+        """,
+    )
     parser.add_argument("query", help="Research query (e.g., 'metformin cancer')")
+    parser.add_argument("--iterations", type=int, default=3, help="Max iterations (default: 3)")
     args = parser.parse_args()
+    # Fail fast: require API key
+    if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
+        print("=" * 60)
+        print("ERROR: This demo requires a real LLM.")
+        print()
+        print("Set one of the following in your .env file:")
+        print("  OPENAI_API_KEY=sk-...")
+        print("  ANTHROPIC_API_KEY=sk-ant-...")
+        print()
+        print("This is a REAL demo. No mocks. No fake data.")
+        print("=" * 60)
         sys.exit(1)
     print(f"\n{'='*60}")
+    print("DeepCritical Agent Demo (REAL)")
     print(f"Query: {args.query}")
+    print(f"Max Iterations: {args.iterations}")
+    print("Mode: REAL (All live API calls)")
+    print(f"{'='*60}\n")
+    # Setup REAL components
     search_handler = SearchHandler(tools=[PubMedTool(), WebTool()], timeout=30.0)
+    judge_handler = JudgeHandler()  # REAL LLM judge
     config = OrchestratorConfig(max_iterations=args.iterations)
     orchestrator = Orchestrator(
         search_handler=search_handler, judge_handler=judge_handler, config=config
     )
+    # Run the REAL loop
     try:
         async for event in orchestrator.run(args.query):
+            # Print event with icon (remove markdown bold for CLI)
             print(event.to_markdown().replace("**", ""))
+            # Show search results count
             if event.type == "search_complete" and event.data:
                 print(f"   -> Found {event.data.get('new_count', 0)} new items")
     except Exception as e:
         print(f"\n❌ Error: {e}")
+        raise
+    print("\n" + "=" * 60)
+    print("Demo complete! Everything was REAL:")
+    print("  - Real PubMed/Web searches")
+    print("  - Real LLM judge decisions")
+    print("  - Real iterative refinement")
+    print("=" * 60 + "\n")
 if __name__ == "__main__":