VibecoderMcSwaggins commited on
Commit
e16b9e6
Β·
1 Parent(s): dd587c9

docs: add Phase 14 Demo Video & Hackathon Submission specification

Browse files

- Created documentation for the demo video and hackathon submission process, outlining goals, requirements, and deadlines.
- Included detailed specifications for the demo video format, content, and recommended tools.
- Established a checklist for submission requirements, including social media posts and HuggingFace Space configuration.
- Summarized prize eligibility and potential awards based on completed phases.

Files added:
- docs/implementation/14_phase_demo_submission.md

docs/implementation/13_phase_modal_integration.md ADDED
@@ -0,0 +1,879 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 13 Implementation Spec: Modal Pipeline Integration
2
+
3
+ **Goal**: Wire existing Modal code execution into the agent pipeline.
4
+ **Philosophy**: "Sandboxed execution makes AI-generated code trustworthy."
5
+ **Prerequisite**: Phase 12 complete (MCP server working)
6
+ **Priority**: P1 - HIGH VALUE ($2,500 Modal Innovation Award)
7
+ **Estimated Time**: 2-3 hours
8
+
9
+ ---
10
+
11
+ ## 1. Why Modal Integration?
12
+
13
+ ### Current State Analysis
14
+
15
+ Mario already implemented `src/tools/code_execution.py`:
16
+
17
+ | Component | Status | Notes |
18
+ |-----------|--------|-------|
19
+ | `ModalCodeExecutor` class | Built | Executes Python in Modal sandbox |
20
+ | `SANDBOX_LIBRARIES` | Defined | pandas, numpy, scipy, etc. |
21
+ | `execute()` method | Implemented | Stdout/stderr capture |
22
+ | `execute_with_return()` | Implemented | Returns `result` variable |
23
+ | `AnalysisAgent` | Built | Uses Modal for statistical analysis |
24
+ | **Pipeline Integration** | **MISSING** | Not wired into main orchestrator |
25
+
26
+ ### What's Missing
27
+
28
+ ```
29
+ Current Flow:
30
+ User Query β†’ Orchestrator β†’ Search β†’ Judge β†’ [Report] β†’ Done
31
+
32
+ With Modal:
33
+ User Query β†’ Orchestrator β†’ Search β†’ Judge β†’ [Hypothesis] β†’ [Analysis*] β†’ Report β†’ Done
34
+ ↓
35
+ Modal Sandbox Execution
36
+ ```
37
+
38
+ *The AnalysisAgent exists but is NOT called by either orchestrator.
39
+
40
+ ---
41
+
42
+ ## 2. Prize Opportunity
43
+
44
+ ### Modal Innovation Award: $2,500
45
+
46
+ **Judging Criteria**:
47
+ 1. **Sandbox Isolation** - Code runs in container, not local
48
+ 2. **Scientific Computing** - Real pandas/scipy analysis
49
+ 3. **Safety** - Can't access local filesystem
50
+ 4. **Speed** - Modal's fast cold starts
51
+
52
+ ### What We Need to Show
53
+
54
+ ```python
55
+ # LLM generates analysis code
56
+ code = """
57
+ import pandas as pd
58
+ import scipy.stats as stats
59
+
60
+ # Analyze extracted metrics from evidence
61
+ data = pd.DataFrame({
62
+ 'study': ['Study1', 'Study2', 'Study3'],
63
+ 'effect_size': [0.45, 0.52, 0.38],
64
+ 'sample_size': [120, 85, 200]
65
+ })
66
+
67
+ # Meta-analysis statistics
68
+ weighted_mean = (data['effect_size'] * data['sample_size']).sum() / data['sample_size'].sum()
69
+ t_stat, p_value = stats.ttest_1samp(data['effect_size'], 0)
70
+
71
+ print(f"Weighted Effect Size: {weighted_mean:.3f}")
72
+ print(f"P-value: {p_value:.4f}")
73
+
74
+ if p_value < 0.05:
75
+ result = "SUPPORTED"
76
+ else:
77
+ result = "INCONCLUSIVE"
78
+ """
79
+
80
+ # Executed SAFELY in Modal sandbox
81
+ executor = get_code_executor()
82
+ output = executor.execute(code) # Runs in isolated container!
83
+ ```
84
+
85
+ ---
86
+
87
+ ## 3. Technical Specification
88
+
89
+ ### 3.1 Dependencies (Already Present)
90
+
91
+ ```toml
92
+ # pyproject.toml - already has Modal
93
+ dependencies = [
94
+ "modal>=0.63.0",
95
+ # ...
96
+ ]
97
+ ```
98
+
99
+ ### 3.2 Environment Variables
100
+
101
+ ```bash
102
+ # .env
103
+ MODAL_TOKEN_ID=your-token-id
104
+ MODAL_TOKEN_SECRET=your-token-secret
105
+ ```
106
+
107
+ ### 3.3 Integration Points
108
+
109
+ | Integration Point | File | Change Required |
110
+ |-------------------|------|-----------------|
111
+ | Simple Orchestrator | `src/orchestrator.py` | Add `AnalysisAgent` call |
112
+ | Magentic Orchestrator | `src/orchestrator_magentic.py` | Add `AnalysisAgent` participant |
113
+ | Gradio UI | `src/app.py` | Add toggle for analysis mode |
114
+ | Config | `src/utils/config.py` | Add `enable_modal_analysis` setting |
115
+
116
+ ---
117
+
118
+ ## 4. Implementation
119
+
120
+ ### 4.1 Configuration Update (`src/utils/config.py`)
121
+
122
+ ```python
123
+ class Settings(BaseSettings):
124
+ # ... existing settings ...
125
+
126
+ # Modal Configuration
127
+ modal_token_id: str | None = None
128
+ modal_token_secret: str | None = None
129
+ enable_modal_analysis: bool = False # Opt-in for hackathon demo
130
+
131
+ @property
132
+ def modal_available(self) -> bool:
133
+ """Check if Modal credentials are configured."""
134
+ return bool(self.modal_token_id and self.modal_token_secret)
135
+ ```
136
+
137
+ ### 4.2 Simple Orchestrator Update (`src/orchestrator.py`)
138
+
139
+ ```python
140
+ """Main orchestrator with optional Modal analysis."""
141
+
142
+ from src.utils.config import settings
143
+
144
+ # ... existing imports ...
145
+
146
+
147
+ class Orchestrator:
148
+ """Search-Judge-Analyze orchestration loop."""
149
+
150
+ def __init__(
151
+ self,
152
+ search_handler: SearchHandlerProtocol,
153
+ judge_handler: JudgeHandlerProtocol,
154
+ config: OrchestratorConfig | None = None,
155
+ enable_analysis: bool = False, # New parameter
156
+ ) -> None:
157
+ self.search = search_handler
158
+ self.judge = judge_handler
159
+ self.config = config or OrchestratorConfig()
160
+ self.history: list[dict[str, Any]] = []
161
+ self._enable_analysis = enable_analysis and settings.modal_available
162
+
163
+ # Lazy-load analysis components
164
+ self._hypothesis_agent: Any = None
165
+ self._analysis_agent: Any = None
166
+
167
+ async def _get_hypothesis_agent(self) -> Any:
168
+ """Lazy initialization of HypothesisAgent."""
169
+ if self._hypothesis_agent is None:
170
+ from src.agents.hypothesis_agent import HypothesisAgent
171
+
172
+ self._hypothesis_agent = HypothesisAgent(
173
+ evidence_store={"current": []},
174
+ )
175
+ return self._hypothesis_agent
176
+
177
+ async def _get_analysis_agent(self) -> Any:
178
+ """Lazy initialization of AnalysisAgent."""
179
+ if self._analysis_agent is None:
180
+ from src.agents.analysis_agent import AnalysisAgent
181
+
182
+ self._analysis_agent = AnalysisAgent(
183
+ evidence_store={"current": [], "hypotheses": []},
184
+ )
185
+ return self._analysis_agent
186
+
187
+ async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
188
+ """Main orchestration loop with optional Modal analysis."""
189
+ # ... existing search/judge loop ...
190
+
191
+ # After judge says "synthesize", optionally run analysis
192
+ if self._enable_analysis and assessment.recommendation == "synthesize":
193
+ yield AgentEvent(
194
+ type="analyzing",
195
+ message="Running statistical analysis in Modal sandbox...",
196
+ data={},
197
+ iteration=iteration,
198
+ )
199
+
200
+ try:
201
+ # Generate hypotheses first
202
+ hypothesis_agent = await self._get_hypothesis_agent()
203
+ hypothesis_agent._evidence_store["current"] = all_evidence
204
+
205
+ hypothesis_result = await hypothesis_agent.run(query)
206
+ hypotheses = hypothesis_agent._evidence_store.get("hypotheses", [])
207
+
208
+ # Run Modal analysis
209
+ analysis_agent = await self._get_analysis_agent()
210
+ analysis_agent._evidence_store["current"] = all_evidence
211
+ analysis_agent._evidence_store["hypotheses"] = hypotheses
212
+
213
+ analysis_result = await analysis_agent.run(query)
214
+
215
+ yield AgentEvent(
216
+ type="analysis_complete",
217
+ message="Modal analysis complete",
218
+ data=analysis_agent._evidence_store.get("analysis", {}),
219
+ iteration=iteration,
220
+ )
221
+
222
+ except Exception as e:
223
+ yield AgentEvent(
224
+ type="error",
225
+ message=f"Modal analysis failed: {e}",
226
+ data={"error": str(e)},
227
+ iteration=iteration,
228
+ )
229
+
230
+ # Continue to synthesis...
231
+ ```
232
+
233
+ ### 4.3 MCP Tool for Modal Analysis (`src/mcp_tools.py`)
234
+
235
+ Add a new MCP tool for direct Modal analysis:
236
+
237
+ ```python
238
+ async def analyze_hypothesis(
239
+ drug: str,
240
+ condition: str,
241
+ evidence_summary: str,
242
+ ) -> str:
243
+ """Perform statistical analysis of drug repurposing hypothesis using Modal.
244
+
245
+ Executes AI-generated Python code in a secure Modal sandbox to analyze
246
+ the statistical evidence for a drug repurposing hypothesis.
247
+
248
+ Args:
249
+ drug: The drug being evaluated (e.g., "metformin")
250
+ condition: The target condition (e.g., "Alzheimer's disease")
251
+ evidence_summary: Summary of evidence to analyze
252
+
253
+ Returns:
254
+ Analysis result with verdict (SUPPORTED/REFUTED/INCONCLUSIVE) and statistics
255
+ """
256
+ from src.tools.code_execution import get_code_executor, CodeExecutionError
257
+ from src.agent_factory.judges import get_model
258
+ from pydantic_ai import Agent
259
+
260
+ # Check Modal availability
261
+ from src.utils.config import settings
262
+ if not settings.modal_available:
263
+ return "Error: Modal credentials not configured. Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET."
264
+
265
+ # Generate analysis code using LLM
266
+ code_agent = Agent(
267
+ model=get_model(),
268
+ output_type=str,
269
+ system_prompt="""Generate Python code to analyze drug repurposing evidence.
270
+ Use pandas, numpy, scipy.stats. Output executable code only.
271
+ Set 'result' variable to SUPPORTED, REFUTED, or INCONCLUSIVE.
272
+ Print key statistics and p-values.""",
273
+ )
274
+
275
+ prompt = f"""Analyze this hypothesis:
276
+ Drug: {drug}
277
+ Condition: {condition}
278
+
279
+ Evidence:
280
+ {evidence_summary}
281
+
282
+ Generate statistical analysis code."""
283
+
284
+ try:
285
+ code_result = await code_agent.run(prompt)
286
+ generated_code = code_result.output
287
+
288
+ # Execute in Modal sandbox
289
+ executor = get_code_executor()
290
+ import asyncio
291
+ loop = asyncio.get_running_loop()
292
+ from functools import partial
293
+ execution = await loop.run_in_executor(
294
+ None, partial(executor.execute, generated_code, timeout=60)
295
+ )
296
+
297
+ if not execution["success"]:
298
+ return f"## Analysis Failed\n\nError: {execution['error']}"
299
+
300
+ # Format output
301
+ return f"""## Statistical Analysis: {drug} for {condition}
302
+
303
+ ### Execution Output
304
+ ```
305
+ {execution['stdout']}
306
+ ```
307
+
308
+ ### Generated Code
309
+ ```python
310
+ {generated_code}
311
+ ```
312
+
313
+ **Executed in Modal Sandbox** - Isolated, secure, reproducible.
314
+ """
315
+
316
+ except CodeExecutionError as e:
317
+ return f"## Analysis Error\n\n{e}"
318
+ except Exception as e:
319
+ return f"## Unexpected Error\n\n{e}"
320
+ ```
321
+
322
+ ### 4.4 Demo Script (`examples/modal_demo/run_analysis.py`)
323
+
324
+ ```python
325
+ #!/usr/bin/env python3
326
+ """Demo: Modal-powered statistical analysis of drug repurposing evidence.
327
+
328
+ This script demonstrates:
329
+ 1. Gathering evidence from PubMed
330
+ 2. Generating analysis code with LLM
331
+ 3. Executing in Modal sandbox
332
+ 4. Returning statistical insights
333
+
334
+ Usage:
335
+ export OPENAI_API_KEY=...
336
+ export MODAL_TOKEN_ID=...
337
+ export MODAL_TOKEN_SECRET=...
338
+ uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
339
+ """
340
+
341
+ import argparse
342
+ import asyncio
343
+ import os
344
+ import sys
345
+
346
+ from src.agents.analysis_agent import AnalysisAgent
347
+ from src.agents.hypothesis_agent import HypothesisAgent
348
+ from src.tools.pubmed import PubMedTool
349
+ from src.utils.config import settings
350
+
351
+
352
+ async def main() -> None:
353
+ """Run the Modal analysis demo."""
354
+ parser = argparse.ArgumentParser(description="Modal Analysis Demo")
355
+ parser.add_argument("query", help="Research query (e.g., 'metformin alzheimer')")
356
+ args = parser.parse_args()
357
+
358
+ # Check credentials
359
+ if not settings.modal_available:
360
+ print("Error: Modal credentials not configured.")
361
+ print("Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET in .env")
362
+ sys.exit(1)
363
+
364
+ if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
365
+ print("Error: No LLM API key found.")
366
+ sys.exit(1)
367
+
368
+ print(f"\n{'='*60}")
369
+ print("DeepCritical Modal Analysis Demo")
370
+ print(f"Query: {args.query}")
371
+ print(f"{'='*60}\n")
372
+
373
+ # Step 1: Gather Evidence
374
+ print("Step 1: Gathering evidence from PubMed...")
375
+ pubmed = PubMedTool()
376
+ evidence = await pubmed.search(args.query, max_results=5)
377
+ print(f" Found {len(evidence)} papers\n")
378
+
379
+ # Step 2: Generate Hypotheses
380
+ print("Step 2: Generating mechanistic hypotheses...")
381
+ evidence_store: dict = {"current": evidence, "hypotheses": []}
382
+ hypothesis_agent = HypothesisAgent(evidence_store=evidence_store)
383
+ await hypothesis_agent.run(args.query)
384
+ hypotheses = evidence_store.get("hypotheses", [])
385
+ print(f" Generated {len(hypotheses)} hypotheses\n")
386
+
387
+ if hypotheses:
388
+ print(f" Primary: {hypotheses[0].drug} β†’ {hypotheses[0].target}")
389
+
390
+ # Step 3: Run Modal Analysis
391
+ print("\nStep 3: Running statistical analysis in Modal sandbox...")
392
+ print(" (This executes LLM-generated code in an isolated container)\n")
393
+
394
+ analysis_agent = AnalysisAgent(evidence_store=evidence_store)
395
+ result = await analysis_agent.run(args.query)
396
+
397
+ # Step 4: Display Results
398
+ print("\n" + "="*60)
399
+ print("ANALYSIS RESULTS")
400
+ print("="*60)
401
+
402
+ if result.messages:
403
+ print(result.messages[0].text)
404
+
405
+ analysis = evidence_store.get("analysis", {})
406
+ if analysis:
407
+ print(f"\nVerdict: {analysis.get('verdict', 'N/A')}")
408
+ print(f"Confidence: {analysis.get('confidence', 0):.0%}")
409
+
410
+ print("\n[Demo Complete - Code was executed in Modal, not locally]")
411
+
412
+
413
+ if __name__ == "__main__":
414
+ asyncio.run(main())
415
+ ```
416
+
417
+ ### 4.5 Verification Script (`examples/modal_demo/verify_sandbox.py`)
418
+
419
+ ```python
420
+ #!/usr/bin/env python3
421
+ """Verify that Modal sandbox is properly isolated.
422
+
423
+ This script proves to judges that code runs in Modal, not locally.
424
+ It attempts operations that would succeed locally but fail in sandbox.
425
+
426
+ Usage:
427
+ uv run python examples/modal_demo/verify_sandbox.py
428
+ """
429
+
430
+ import asyncio
431
+ from functools import partial
432
+
433
+ from src.tools.code_execution import get_code_executor
434
+ from src.utils.config import settings
435
+
436
+
437
+ async def main() -> None:
438
+ """Verify Modal sandbox isolation."""
439
+ if not settings.modal_available:
440
+ print("Error: Modal credentials not configured.")
441
+ return
442
+
443
+ executor = get_code_executor()
444
+ loop = asyncio.get_running_loop()
445
+
446
+ print("="*60)
447
+ print("Modal Sandbox Isolation Verification")
448
+ print("="*60 + "\n")
449
+
450
+ # Test 1: Prove it's not running locally
451
+ print("Test 1: Check hostname (should NOT be your machine)")
452
+ code1 = """
453
+ import socket
454
+ print(f"Hostname: {socket.gethostname()}")
455
+ """
456
+ result1 = await loop.run_in_executor(None, partial(executor.execute, code1))
457
+ print(f" Result: {result1['stdout'].strip()}")
458
+ print(f" (Your local hostname would be different)\n")
459
+
460
+ # Test 2: Verify scientific libraries available
461
+ print("Test 2: Verify scientific libraries")
462
+ code2 = """
463
+ import pandas as pd
464
+ import numpy as np
465
+ import scipy
466
+ print(f"pandas: {pd.__version__}")
467
+ print(f"numpy: {np.__version__}")
468
+ print(f"scipy: {scipy.__version__}")
469
+ """
470
+ result2 = await loop.run_in_executor(None, partial(executor.execute, code2))
471
+ print(f" {result2['stdout'].strip()}\n")
472
+
473
+ # Test 3: Verify network is blocked (security)
474
+ print("Test 3: Verify network isolation (should fail)")
475
+ code3 = """
476
+ import urllib.request
477
+ try:
478
+ urllib.request.urlopen("https://google.com", timeout=2)
479
+ print("Network: ALLOWED (unexpected)")
480
+ except Exception as e:
481
+ print(f"Network: BLOCKED (as expected)")
482
+ """
483
+ result3 = await loop.run_in_executor(None, partial(executor.execute, code3))
484
+ print(f" {result3['stdout'].strip()}\n")
485
+
486
+ # Test 4: Run actual statistical analysis
487
+ print("Test 4: Execute real statistical analysis")
488
+ code4 = """
489
+ import pandas as pd
490
+ import scipy.stats as stats
491
+
492
+ data = pd.DataFrame({
493
+ 'drug': ['Metformin'] * 3,
494
+ 'effect': [0.42, 0.38, 0.51],
495
+ 'n': [100, 150, 80]
496
+ })
497
+
498
+ mean_effect = data['effect'].mean()
499
+ sem = data['effect'].sem()
500
+ t_stat, p_val = stats.ttest_1samp(data['effect'], 0)
501
+
502
+ print(f"Mean Effect: {mean_effect:.3f} (SE: {sem:.3f})")
503
+ print(f"t-statistic: {t_stat:.2f}, p-value: {p_val:.4f}")
504
+ print(f"Verdict: {'SUPPORTED' if p_val < 0.05 else 'INCONCLUSIVE'}")
505
+ """
506
+ result4 = await loop.run_in_executor(None, partial(executor.execute, code4))
507
+ print(f" {result4['stdout'].strip()}\n")
508
+
509
+ print("="*60)
510
+ print("All tests complete - Modal sandbox verified!")
511
+ print("="*60)
512
+
513
+
514
+ if __name__ == "__main__":
515
+ asyncio.run(main())
516
+ ```
517
+
518
+ ---
519
+
520
+ ## 5. TDD Test Suite
521
+
522
+ ### 5.1 Unit Tests (`tests/unit/tools/test_modal_integration.py`)
523
+
524
+ ```python
525
+ """Unit tests for Modal pipeline integration."""
526
+
527
+ from unittest.mock import AsyncMock, MagicMock, patch
528
+
529
+ import pytest
530
+
531
+ from src.utils.models import Evidence, Citation
532
+
533
+
534
+ @pytest.fixture
535
+ def sample_evidence() -> list[Evidence]:
536
+ """Sample evidence for testing."""
537
+ return [
538
+ Evidence(
539
+ content="Metformin shows effect size of 0.45 in Alzheimer's model.",
540
+ citation=Citation(
541
+ source="pubmed",
542
+ title="Metformin Study",
543
+ url="https://pubmed.ncbi.nlm.nih.gov/12345/",
544
+ date="2024-01-15",
545
+ authors=["Smith J"],
546
+ ),
547
+ relevance=0.9,
548
+ )
549
+ ]
550
+
551
+
552
+ class TestAnalysisAgentIntegration:
553
+ """Tests for AnalysisAgent integration."""
554
+
555
+ @pytest.mark.asyncio
556
+ async def test_analysis_agent_generates_code(
557
+ self, sample_evidence: list[Evidence]
558
+ ) -> None:
559
+ """AnalysisAgent should generate Python code for analysis."""
560
+ from src.agents.analysis_agent import AnalysisAgent
561
+
562
+ evidence_store = {
563
+ "current": sample_evidence,
564
+ "hypotheses": [
565
+ MagicMock(
566
+ drug="metformin",
567
+ target="AMPK",
568
+ pathway="autophagy",
569
+ effect="neuroprotection",
570
+ confidence=0.8,
571
+ )
572
+ ],
573
+ }
574
+
575
+ with patch("src.agents.analysis_agent.get_code_executor") as mock_executor, \
576
+ patch("src.agents.analysis_agent.get_model") as mock_model:
577
+
578
+ # Mock LLM to return code
579
+ mock_agent = AsyncMock()
580
+ mock_agent.run = AsyncMock(return_value=MagicMock(
581
+ output="import pandas as pd\nresult = 'SUPPORTED'"
582
+ ))
583
+
584
+ # Mock Modal execution
585
+ mock_executor.return_value.execute.return_value = {
586
+ "stdout": "SUPPORTED",
587
+ "stderr": "",
588
+ "success": True,
589
+ "error": None,
590
+ }
591
+
592
+ agent = AnalysisAgent(evidence_store=evidence_store)
593
+ agent._agent = mock_agent
594
+
595
+ result = await agent.run("metformin alzheimer")
596
+
597
+ assert result.messages[0].text is not None
598
+ assert "analysis" in evidence_store
599
+
600
+
601
+ class TestModalExecutorUnit:
602
+ """Unit tests for ModalCodeExecutor."""
603
+
604
+ def test_executor_checks_credentials(self) -> None:
605
+ """Executor should warn if credentials missing."""
606
+ import os
607
+ from unittest.mock import patch
608
+
609
+ with patch.dict(os.environ, {}, clear=True):
610
+ from src.tools.code_execution import ModalCodeExecutor
611
+
612
+ # Should not raise, but should log warning
613
+ executor = ModalCodeExecutor()
614
+ assert executor.modal_token_id is None
615
+
616
+ def test_get_sandbox_library_list(self) -> None:
617
+ """Should return list of library==version strings."""
618
+ from src.tools.code_execution import get_sandbox_library_list
619
+
620
+ libs = get_sandbox_library_list()
621
+
622
+ assert isinstance(libs, list)
623
+ assert "pandas==2.2.0" in libs
624
+ assert "numpy==1.26.4" in libs
625
+
626
+
627
+ class TestOrchestratorWithAnalysis:
628
+ """Tests for orchestrator with Modal analysis enabled."""
629
+
630
+ @pytest.mark.asyncio
631
+ async def test_orchestrator_calls_analysis_when_enabled(self) -> None:
632
+ """Orchestrator should call AnalysisAgent when enabled and Modal available."""
633
+ from src.orchestrator import Orchestrator
634
+ from src.utils.models import OrchestratorConfig
635
+
636
+ with patch("src.orchestrator.settings") as mock_settings:
637
+ mock_settings.modal_available = True
638
+
639
+ mock_search = AsyncMock()
640
+ mock_search.search.return_value = MagicMock(
641
+ evidence=[],
642
+ errors=[],
643
+ )
644
+
645
+ mock_judge = AsyncMock()
646
+ mock_judge.assess.return_value = MagicMock(
647
+ sufficient=True,
648
+ recommendation="synthesize",
649
+ next_search_queries=[],
650
+ )
651
+
652
+ config = OrchestratorConfig(max_iterations=1)
653
+ orchestrator = Orchestrator(
654
+ search_handler=mock_search,
655
+ judge_handler=mock_judge,
656
+ config=config,
657
+ enable_analysis=True,
658
+ )
659
+
660
+ # Collect events
661
+ events = []
662
+ async for event in orchestrator.run("test query"):
663
+ events.append(event)
664
+
665
+ # Should have analyzing event if Modal enabled
666
+ event_types = [e.type for e in events]
667
+ # Note: This test verifies the flow, actual Modal call is mocked
668
+ ```
669
+
670
+ ### 5.2 Integration Test (`tests/integration/test_modal.py`)
671
+
672
+ ```python
673
+ """Integration tests for Modal code execution (requires Modal credentials)."""
674
+
675
+ import pytest
676
+
677
+ from src.utils.config import settings
678
+
679
+
680
+ @pytest.mark.integration
681
+ @pytest.mark.skipif(
682
+ not settings.modal_available,
683
+ reason="Modal credentials not configured"
684
+ )
685
+ class TestModalIntegration:
686
+ """Integration tests for Modal (requires credentials)."""
687
+
688
+ @pytest.mark.asyncio
689
+ async def test_modal_executes_real_code(self) -> None:
690
+ """Test actual code execution in Modal sandbox."""
691
+ import asyncio
692
+ from functools import partial
693
+
694
+ from src.tools.code_execution import get_code_executor
695
+
696
+ executor = get_code_executor()
697
+ code = """
698
+ import pandas as pd
699
+ result = pd.DataFrame({'a': [1,2,3]})['a'].sum()
700
+ print(f"Sum: {result}")
701
+ """
702
+
703
+ loop = asyncio.get_running_loop()
704
+ result = await loop.run_in_executor(
705
+ None, partial(executor.execute, code, timeout=30)
706
+ )
707
+
708
+ assert result["success"]
709
+ assert "Sum: 6" in result["stdout"]
710
+
711
+ @pytest.mark.asyncio
712
+ async def test_modal_blocks_network(self) -> None:
713
+ """Verify network is blocked in sandbox."""
714
+ import asyncio
715
+ from functools import partial
716
+
717
+ from src.tools.code_execution import get_code_executor
718
+
719
+ executor = get_code_executor()
720
+ code = """
721
+ import urllib.request
722
+ try:
723
+ urllib.request.urlopen("https://google.com", timeout=2)
724
+ print("NETWORK_ALLOWED")
725
+ except Exception:
726
+ print("NETWORK_BLOCKED")
727
+ """
728
+
729
+ loop = asyncio.get_running_loop()
730
+ result = await loop.run_in_executor(
731
+ None, partial(executor.execute, code, timeout=30)
732
+ )
733
+
734
+ assert "NETWORK_BLOCKED" in result["stdout"]
735
+ ```
736
+
737
+ ---
738
+
739
+ ## 6. Verification Commands
740
+
741
+ ```bash
742
+ # 1. Set Modal credentials
743
+ export MODAL_TOKEN_ID=your-token-id
744
+ export MODAL_TOKEN_SECRET=your-token-secret
745
+
746
+ # Or via modal CLI
747
+ modal setup
748
+
749
+ # 2. Run unit tests
750
+ uv run pytest tests/unit/tools/test_modal_integration.py -v
751
+
752
+ # 3. Run verification script (proves sandbox works)
753
+ uv run python examples/modal_demo/verify_sandbox.py
754
+
755
+ # 4. Run full demo
756
+ uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
757
+
758
+ # 5. Run integration tests (requires Modal creds)
759
+ uv run pytest tests/integration/test_modal.py -v -m integration
760
+
761
+ # 6. Run full test suite
762
+ make check
763
+ ```
764
+
765
+ ---
766
+
767
+ ## 7. Definition of Done
768
+
769
+ Phase 13 is **COMPLETE** when:
770
+
771
+ - [ ] `src/utils/config.py` updated with `enable_modal_analysis` setting
772
+ - [ ] `src/orchestrator.py` optionally calls `AnalysisAgent`
773
+ - [ ] `src/mcp_tools.py` has `analyze_hypothesis` MCP tool
774
+ - [ ] `examples/modal_demo/run_analysis.py` working demo
775
+ - [ ] `examples/modal_demo/verify_sandbox.py` verification script
776
+ - [ ] Unit tests in `tests/unit/tools/test_modal_integration.py`
777
+ - [ ] Integration tests in `tests/integration/test_modal.py`
778
+ - [ ] Verification script proves sandbox isolation
779
+ - [ ] All unit tests pass
780
+ - [ ] Lints pass
781
+
782
+ ---
783
+
784
+ ## 8. Demo Script for Judges
785
+
786
+ ### Show Modal Innovation
787
+
788
+ 1. **Run verification script** (proves sandbox):
789
+ ```bash
790
+ uv run python examples/modal_demo/verify_sandbox.py
791
+ ```
792
+ - Shows hostname is NOT local machine
793
+ - Shows scientific libraries available
794
+ - Shows network is BLOCKED (security)
795
+ - Shows real statistics execution
796
+
797
+ 2. **Run analysis demo**:
798
+ ```bash
799
+ uv run python examples/modal_demo/run_analysis.py "metformin cancer"
800
+ ```
801
+ - Shows evidence gathering
802
+ - Shows hypothesis generation
803
+ - Shows code execution in Modal
804
+ - Shows statistical verdict
805
+
806
+ 3. **Show the key differentiator**:
807
+ > "LLM-generated code executes in an isolated Modal container. This is enterprise-grade safety for AI-powered scientific computing."
808
+
809
+ ---
810
+
811
+ ## 9. Value Delivered
812
+
813
+ | Before | After |
814
+ |--------|-------|
815
+ | Code execution exists but unused | Integrated into pipeline |
816
+ | No demo of sandbox isolation | Verification script proves it |
817
+ | No MCP tool for analysis | `analyze_hypothesis` MCP tool |
818
+ | No judge-friendly demo | Clear demo script |
819
+
820
+ **Prize Impact**:
821
+ - With Modal Integration: **Eligible for $2,500 Modal Innovation Award**
822
+
823
+ ---
824
+
825
+ ## 10. Files to Create/Modify
826
+
827
+ | File | Action | Purpose |
828
+ |------|--------|---------|
829
+ | `src/utils/config.py` | MODIFY | Add `enable_modal_analysis` |
830
+ | `src/orchestrator.py` | MODIFY | Add optional AnalysisAgent call |
831
+ | `src/mcp_tools.py` | MODIFY | Add `analyze_hypothesis` MCP tool |
832
+ | `examples/modal_demo/run_analysis.py` | CREATE | Demo script |
833
+ | `examples/modal_demo/verify_sandbox.py` | CREATE | Verification script |
834
+ | `tests/unit/tools/test_modal_integration.py` | CREATE | Unit tests |
835
+ | `tests/integration/test_modal.py` | CREATE | Integration tests |
836
+
837
+ ---
838
+
839
+ ## 11. Architecture After Phase 13
840
+
841
+ ```
842
+ User Query
843
+ ↓
844
+ Orchestrator
845
+ ↓
846
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
847
+ β”‚ Search Phase β”‚
848
+ β”‚ PubMedTool β†’ ClinicalTrialsTool β†’ BioRxivTool β”‚
849
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
850
+ ↓
851
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
852
+ β”‚ Judge Phase β”‚
853
+ β”‚ JudgeHandler β†’ "sufficient" β†’ continue to synthesis β”‚
854
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
855
+ ↓ (if enable_modal_analysis=True)
856
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
857
+ β”‚ Analysis Phase (NEW) β”‚
858
+ β”‚ HypothesisAgent β†’ Generate mechanistic hypotheses β”‚
859
+ β”‚ ↓ β”‚
860
+ β”‚ AnalysisAgent β†’ Generate Python code β”‚
861
+ β”‚ ↓ β”‚
862
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
863
+ β”‚ β”‚ Modal Sandbox Container β”‚ β”‚
864
+ β”‚ β”‚ - pandas, numpy, scipy, sklearn β”‚ β”‚
865
+ β”‚ β”‚ - Network BLOCKED β”‚ β”‚
866
+ β”‚ β”‚ - Filesystem ISOLATED β”‚ β”‚
867
+ β”‚ β”‚ - Execute β†’ Return stdout β”‚ β”‚
868
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
869
+ β”‚ ↓ β”‚
870
+ β”‚ AnalysisResult β†’ SUPPORTED/REFUTED/INCONCLUSIVE β”‚
871
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
872
+ ↓
873
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
874
+ β”‚ Report Phase β”‚
875
+ β”‚ ReportAgent β†’ Structured scientific report β”‚
876
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
877
+ ```
878
+
879
+ **This is the Modal-powered analytics stack.**
docs/implementation/14_phase_demo_submission.md ADDED
@@ -0,0 +1,464 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 14 Implementation Spec: Demo Video & Hackathon Submission
2
+
3
+ **Goal**: Create compelling demo video and complete hackathon submission.
4
+ **Philosophy**: "Ship it with style."
5
+ **Prerequisite**: Phases 12-13 complete (MCP + Modal working)
6
+ **Priority**: P0 - REQUIRED FOR SUBMISSION
7
+ **Deadline**: November 30, 2025 11:59 PM UTC
8
+ **Estimated Time**: 2-3 hours
9
+
10
+ ---
11
+
12
+ ## 1. Submission Requirements
13
+
14
+ ### MCP's 1st Birthday Hackathon Checklist
15
+
16
+ | Requirement | Status | Action |
17
+ |-------------|--------|--------|
18
+ | HuggingFace Space in `MCP-1st-Birthday` org | Pending | Transfer or create |
19
+ | Track tag in README.md | Pending | Add tag |
20
+ | Social media post link | Pending | Create post |
21
+ | Demo video (1-5 min) | Pending | Record |
22
+ | Team members registered | Pending | Verify |
23
+ | Original work (Nov 14-30) | **DONE** | All commits in range |
24
+
25
+ ### Track 2: MCP in Action - Tags
26
+
27
+ ```yaml
28
+ # Add to HuggingFace Space README.md
29
+ tags:
30
+ - mcp-in-action-track-enterprise # Healthcare/enterprise focus
31
+ ```
32
+
33
+ ---
34
+
35
+ ## 2. Prize Eligibility Summary
36
+
37
+ ### After Phases 12-13
38
+
39
+ | Award | Amount | Eligible | Requirements Met |
40
+ |-------|--------|----------|------------------|
41
+ | Track 2: MCP in Action (1st) | $2,500 | **YES** | MCP server working |
42
+ | Modal Innovation | $2,500 | **YES** | Sandbox demo ready |
43
+ | LlamaIndex | $1,000 | **YES** | Using RAG |
44
+ | Community Choice | $1,000 | Possible | Need great demo |
45
+ | **Total Potential** | **$7,000** | | |
46
+
47
+ ---
48
+
49
+ ## 3. Demo Video Specification
50
+
51
+ ### 3.1 Duration & Format
52
+
53
+ - **Length**: 3-4 minutes (sweet spot)
54
+ - **Format**: Screen recording + voice-over
55
+ - **Resolution**: 1080p minimum
56
+ - **Audio**: Clear narration, no background music
57
+
58
+ ### 3.2 Recommended Tools
59
+
60
+ | Tool | Purpose | Notes |
61
+ |------|---------|-------|
62
+ | OBS Studio | Screen recording | Free, cross-platform |
63
+ | Loom | Quick recording | Good for demos |
64
+ | QuickTime | Mac screen recording | Built-in |
65
+ | DaVinci Resolve | Editing | Free, professional |
66
+
67
+ ### 3.3 Demo Script (4 minutes)
68
+
69
+ ```markdown
70
+ ## Section 1: Hook (30 seconds)
71
+
72
+ [Show Gradio UI]
73
+
74
+ "DeepCritical is an AI-powered drug repurposing research agent.
75
+ It searches peer-reviewed literature, clinical trials, and cutting-edge preprints
76
+ to find new uses for existing drugs."
77
+
78
+ "Let me show you how it works."
79
+
80
+ ---
81
+
82
+ ## Section 2: Core Functionality (60 seconds)
83
+
84
+ [Type query: "Can metformin treat Alzheimer's disease?"]
85
+
86
+ "When I ask about metformin for Alzheimer's, DeepCritical:
87
+ 1. Searches PubMed for peer-reviewed papers
88
+ 2. Queries ClinicalTrials.gov for active trials
89
+ 3. Scans bioRxiv for the latest preprints"
90
+
91
+ [Show search results streaming]
92
+
93
+ "It then uses an LLM to assess the evidence quality and
94
+ synthesize findings into a structured research report."
95
+
96
+ [Show final report]
97
+
98
+ ---
99
+
100
+ ## Section 3: MCP Integration (60 seconds)
101
+
102
+ [Switch to Claude Desktop]
103
+
104
+ "What makes DeepCritical unique is full MCP integration.
105
+ These same tools are available to any MCP client."
106
+
107
+ [Show Claude Desktop with DeepCritical tools]
108
+
109
+ "I can ask Claude: 'Search PubMed for aspirin cancer prevention'"
110
+
111
+ [Show results appearing in Claude Desktop]
112
+
113
+ "The agent uses our MCP server to search real biomedical databases."
114
+
115
+ [Show MCP Inspector briefly]
116
+
117
+ "Here's the MCP schema - four tools exposed for any AI to use."
118
+
119
+ ---
120
+
121
+ ## Section 4: Modal Innovation (45 seconds)
122
+
123
+ [Run verify_sandbox.py]
124
+
125
+ "For statistical analysis, we use Modal for secure code execution."
126
+
127
+ [Show sandbox verification output]
128
+
129
+ "Notice the hostname is NOT my machine - code runs in an isolated container.
130
+ Network is blocked. The AI can't reach the internet from the sandbox."
131
+
132
+ [Run analysis demo]
133
+
134
+ "Modal executes LLM-generated statistical code safely,
135
+ returning verdicts like SUPPORTED, REFUTED, or INCONCLUSIVE."
136
+
137
+ ---
138
+
139
+ ## Section 5: Close (45 seconds)
140
+
141
+ [Return to Gradio UI]
142
+
143
+ "DeepCritical brings together:
144
+ - Three biomedical data sources
145
+ - MCP protocol for universal tool access
146
+ - Modal sandboxes for safe code execution
147
+ - LlamaIndex for semantic search
148
+
149
+ All in a beautiful Gradio interface."
150
+
151
+ "Check out the code on GitHub, try it on HuggingFace Spaces,
152
+ and let us know what you think."
153
+
154
+ "Thanks for watching!"
155
+
156
+ [Show links: GitHub, HuggingFace, Team names]
157
+ ```
158
+
159
+ ---
160
+
161
+ ## 4. HuggingFace Space Configuration
162
+
163
+ ### 4.1 Space README.md
164
+
165
+ ```markdown
166
+ ---
167
+ title: DeepCritical
168
+ emoji: 🧬
169
+ colorFrom: blue
170
+ colorTo: purple
171
+ sdk: gradio
172
+ sdk_version: "5.0.0"
173
+ app_file: src/app.py
174
+ pinned: false
175
+ license: mit
176
+ tags:
177
+ - mcp-in-action-track-enterprise
178
+ - mcp-hackathon
179
+ - drug-repurposing
180
+ - biomedical-ai
181
+ - pydantic-ai
182
+ - llamaindex
183
+ - modal
184
+ ---
185
+
186
+ # DeepCritical
187
+
188
+ AI-Powered Drug Repurposing Research Agent
189
+
190
+ ## Features
191
+
192
+ - **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
193
+ - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
194
+ - **Modal Sandbox**: Secure execution of AI-generated statistical code
195
+ - **LlamaIndex RAG**: Semantic search and evidence synthesis
196
+
197
+ ## MCP Tools
198
+
199
+ Connect to our MCP server at:
200
+ ```
201
+ https://MCP-1st-Birthday-deepcritical.hf.space/gradio_api/mcp/
202
+ ```
203
+
204
+ Available tools:
205
+ - `search_pubmed` - Search peer-reviewed biomedical literature
206
+ - `search_clinical_trials` - Search ClinicalTrials.gov
207
+ - `search_biorxiv` - Search bioRxiv/medRxiv preprints
208
+ - `search_all` - Search all sources simultaneously
209
+
210
+ ## Team
211
+
212
+ - The-Obstacle-Is-The-Way
213
+ - MarioAderman
214
+
215
+ ## Links
216
+
217
+ - [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)
218
+ - [Demo Video](link-to-video)
219
+ ```
220
+
221
+ ### 4.2 Environment Variables (Secrets)
222
+
223
+ Set in HuggingFace Space settings:
224
+
225
+ ```
226
+ OPENAI_API_KEY=sk-...
227
+ ANTHROPIC_API_KEY=sk-ant-...
228
+ NCBI_API_KEY=...
229
+ MODAL_TOKEN_ID=...
230
+ MODAL_TOKEN_SECRET=...
231
+ ```
232
+
233
+ ---
234
+
235
+ ## 5. Social Media Post
236
+
237
+ ### Twitter/X Template
238
+
239
+ ```
240
+ 🧬 Excited to submit DeepCritical to MCP's 1st Birthday Hackathon!
241
+
242
+ An AI agent that:
243
+ βœ… Searches PubMed, ClinicalTrials.gov & bioRxiv
244
+ βœ… Exposes tools via MCP protocol
245
+ βœ… Runs statistical code in Modal sandboxes
246
+ βœ… Uses LlamaIndex for semantic search
247
+
248
+ Try it: [HuggingFace link]
249
+ Demo: [Video link]
250
+
251
+ #MCPHackathon #AIAgents #DrugRepurposing @huggingface @AnthropicAI
252
+ ```
253
+
254
+ ### LinkedIn Template
255
+
256
+ ```
257
+ Thrilled to share DeepCritical - our submission to MCP's 1st Birthday Hackathon!
258
+
259
+ πŸ”¬ What it does:
260
+ DeepCritical is an AI-powered drug repurposing research agent that searches
261
+ peer-reviewed literature, clinical trials, and preprints to find new uses
262
+ for existing drugs.
263
+
264
+ πŸ› οΈ Technical highlights:
265
+ β€’ Full MCP integration - tools work with Claude Desktop
266
+ β€’ Modal sandboxes for secure AI-generated code execution
267
+ β€’ LlamaIndex RAG for semantic evidence search
268
+ β€’ Three biomedical data sources in parallel
269
+
270
+ Built with PydanticAI, Gradio, and deployed on HuggingFace Spaces.
271
+
272
+ Try it: [link]
273
+ Watch the demo: [link]
274
+
275
+ #ArtificialIntelligence #Healthcare #DrugDiscovery #MCP #Hackathon
276
+ ```
277
+
278
+ ---
279
+
280
+ ## 6. Pre-Submission Checklist
281
+
282
+ ### 6.1 Code Quality
283
+
284
+ ```bash
285
+ # Run all checks
286
+ make check
287
+
288
+ # Expected output:
289
+ # βœ… Linting passed (ruff)
290
+ # βœ… Type checking passed (mypy)
291
+ # βœ… All 80+ tests passed (pytest)
292
+ ```
293
+
294
+ ### 6.2 Documentation
295
+
296
+ - [ ] README.md updated with MCP instructions
297
+ - [ ] All demo scripts have docstrings
298
+ - [ ] Example files work end-to-end
299
+ - [ ] CLAUDE.md is current
300
+
301
+ ### 6.3 Deployment Verification
302
+
303
+ ```bash
304
+ # Test locally
305
+ uv run python src/app.py
306
+ # Visit http://localhost:7860
307
+
308
+ # Test MCP schema
309
+ curl http://localhost:7860/gradio_api/mcp/schema
310
+
311
+ # Test Modal (if configured)
312
+ uv run python examples/modal_demo/verify_sandbox.py
313
+ ```
314
+
315
+ ### 6.4 HuggingFace Space
316
+
317
+ - [ ] Space created in `MCP-1st-Birthday` organization
318
+ - [ ] Secrets configured (API keys)
319
+ - [ ] App starts without errors
320
+ - [ ] MCP endpoint accessible
321
+ - [ ] Track tag in README
322
+
323
+ ---
324
+
325
+ ## 7. Recording Checklist
326
+
327
+ ### Before Recording
328
+
329
+ - [ ] Close unnecessary apps/notifications
330
+ - [ ] Clear browser history/tabs
331
+ - [ ] Test all demos work
332
+ - [ ] Prepare terminal windows
333
+ - [ ] Write down talking points
334
+
335
+ ### During Recording
336
+
337
+ - [ ] Speak clearly and at moderate pace
338
+ - [ ] Pause briefly between sections
339
+ - [ ] Show your face? (optional, adds personality)
340
+ - [ ] Don't rush - 3-4 min is enough time
341
+
342
+ ### After Recording
343
+
344
+ - [ ] Watch playback for errors
345
+ - [ ] Trim dead air at start/end
346
+ - [ ] Add title/end cards
347
+ - [ ] Export at 1080p
348
+ - [ ] Upload to YouTube/Loom
349
+
350
+ ---
351
+
352
+ ## 8. Submission Steps
353
+
354
+ ### Step 1: Finalize Code
355
+
356
+ ```bash
357
+ # Ensure clean state
358
+ git status
359
+ make check
360
+
361
+ # Push to GitHub
362
+ git push origin main
363
+
364
+ # Sync to HuggingFace
365
+ git push huggingface-upstream main
366
+ ```
367
+
368
+ ### Step 2: Verify HuggingFace Space
369
+
370
+ 1. Visit Space URL
371
+ 2. Test the chat interface
372
+ 3. Test MCP endpoint: `/gradio_api/mcp/schema`
373
+ 4. Verify README has track tag
374
+
375
+ ### Step 3: Record Demo Video
376
+
377
+ 1. Follow script from Section 3.3
378
+ 2. Edit and export
379
+ 3. Upload to YouTube (unlisted) or Loom
380
+ 4. Copy shareable link
381
+
382
+ ### Step 4: Create Social Post
383
+
384
+ 1. Write post (see templates)
385
+ 2. Include video link
386
+ 3. Tag relevant accounts
387
+ 4. Post and copy link
388
+
389
+ ### Step 5: Submit
390
+
391
+ 1. Ensure Space is in `MCP-1st-Birthday` org
392
+ 2. Verify track tag in README
393
+ 3. Submit entry (check hackathon page for form)
394
+ 4. Include all links
395
+
396
+ ---
397
+
398
+ ## 9. Verification Commands
399
+
400
+ ```bash
401
+ # 1. Full test suite
402
+ make check
403
+
404
+ # 2. Start local server
405
+ uv run python src/app.py
406
+
407
+ # 3. Verify MCP works
408
+ curl http://localhost:7860/gradio_api/mcp/schema | jq
409
+
410
+ # 4. Test with MCP Inspector
411
+ npx @anthropic/mcp-inspector http://localhost:7860/gradio_api/mcp/
412
+
413
+ # 5. Run Modal verification
414
+ uv run python examples/modal_demo/verify_sandbox.py
415
+
416
+ # 6. Run full demo
417
+ uv run python examples/orchestrator_demo/run_agent.py "metformin alzheimer"
418
+ ```
419
+
420
+ ---
421
+
422
+ ## 10. Definition of Done
423
+
424
+ Phase 14 is **COMPLETE** when:
425
+
426
+ - [ ] Demo video recorded (3-4 min)
427
+ - [ ] Video uploaded (YouTube/Loom)
428
+ - [ ] Social media post created with link
429
+ - [ ] HuggingFace Space in `MCP-1st-Birthday` org
430
+ - [ ] Track tag in Space README
431
+ - [ ] All team members registered
432
+ - [ ] Entry submitted before deadline
433
+ - [ ] Confirmation received
434
+
435
+ ---
436
+
437
+ ## 11. Timeline
438
+
439
+ | Task | Time | Deadline |
440
+ |------|------|----------|
441
+ | Phase 12: MCP Server | 2-3 hours | Nov 28 |
442
+ | Phase 13: Modal Integration | 2-3 hours | Nov 29 |
443
+ | Phase 14: Demo & Submit | 2-3 hours | Nov 30 |
444
+ | **Buffer** | ~24 hours | Before 11:59 PM UTC |
445
+
446
+ ---
447
+
448
+ ## 12. Contact & Support
449
+
450
+ ### Hackathon Resources
451
+
452
+ - Discord: `#agents-mcp-hackathon-winter25`
453
+ - HuggingFace: [MCP-1st-Birthday org](https://huggingface.co/MCP-1st-Birthday)
454
+ - MCP Docs: [modelcontextprotocol.io](https://modelcontextprotocol.io/)
455
+
456
+ ### Team Communication
457
+
458
+ - Coordinate on final review
459
+ - Agree on who submits
460
+ - Celebrate when done! πŸŽ‰
461
+
462
+ ---
463
+
464
+ **Good luck! Ship it with confidence.**
docs/implementation/roadmap.md CHANGED
@@ -183,6 +183,8 @@ Structured Research Report
183
 
184
  ## Spec Documents
185
 
 
 
186
  1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)** βœ…
187
  2. **[Phase 2 Spec: Search Slice](02_phase_search.md)** βœ…
188
  3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)** βœ…
@@ -191,9 +193,18 @@ Structured Research Report
191
  6. **[Phase 6 Spec: Embeddings & Semantic Search](06_phase_embeddings.md)** βœ…
192
  7. **[Phase 7 Spec: Hypothesis Agent](07_phase_hypothesis.md)** βœ…
193
  8. **[Phase 8 Spec: Report Agent](08_phase_report.md)** βœ…
194
- 9. **[Phase 9 Spec: Remove DuckDuckGo](09_phase_source_cleanup.md)** πŸ“
195
- 10. **[Phase 10 Spec: ClinicalTrials.gov](10_phase_clinicaltrials.md)** πŸ“
196
- 11. **[Phase 11 Spec: bioRxiv Preprints](11_phase_biorxiv.md)** πŸ“
 
 
 
 
 
 
 
 
 
197
 
198
  ---
199
 
@@ -209,8 +220,25 @@ Structured Research Report
209
  | Phase 6: Embeddings | βœ… COMPLETE | Semantic search + ChromaDB |
210
  | Phase 7: Hypothesis | βœ… COMPLETE | Mechanistic reasoning chains |
211
  | Phase 8: Report | βœ… COMPLETE | Structured scientific reports |
212
- | Phase 9: Source Cleanup | πŸ“ SPEC READY | Remove DuckDuckGo |
213
- | Phase 10: ClinicalTrials | πŸ“ SPEC READY | ClinicalTrials.gov API |
214
- | Phase 11: bioRxiv | πŸ“ SPEC READY | Preprint search |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
215
 
216
- *Phases 1-8 COMPLETE. Phases 9-11 will add multi-source credibility.*
 
183
 
184
  ## Spec Documents
185
 
186
+ ### Core Platform (Phases 1-8)
187
+
188
  1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)** βœ…
189
  2. **[Phase 2 Spec: Search Slice](02_phase_search.md)** βœ…
190
  3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)** βœ…
 
193
  6. **[Phase 6 Spec: Embeddings & Semantic Search](06_phase_embeddings.md)** βœ…
194
  7. **[Phase 7 Spec: Hypothesis Agent](07_phase_hypothesis.md)** βœ…
195
  8. **[Phase 8 Spec: Report Agent](08_phase_report.md)** βœ…
196
+
197
+ ### Multi-Source Search (Phases 9-11)
198
+
199
+ 9. **[Phase 9 Spec: Remove DuckDuckGo](09_phase_source_cleanup.md)** βœ…
200
+ 10. **[Phase 10 Spec: ClinicalTrials.gov](10_phase_clinicaltrials.md)** βœ…
201
+ 11. **[Phase 11 Spec: bioRxiv Preprints](11_phase_biorxiv.md)** βœ…
202
+
203
+ ### Hackathon Integration (Phases 12-14)
204
+
205
+ 12. **[Phase 12 Spec: MCP Server](12_phase_mcp_server.md)** πŸ“ P0 - REQUIRED
206
+ 13. **[Phase 13 Spec: Modal Pipeline](13_phase_modal_integration.md)** πŸ“ P1 - $2,500
207
+ 14. **[Phase 14 Spec: Demo & Submission](14_phase_demo_submission.md)** πŸ“ P0 - REQUIRED
208
 
209
  ---
210
 
 
220
  | Phase 6: Embeddings | βœ… COMPLETE | Semantic search + ChromaDB |
221
  | Phase 7: Hypothesis | βœ… COMPLETE | Mechanistic reasoning chains |
222
  | Phase 8: Report | βœ… COMPLETE | Structured scientific reports |
223
+ | Phase 9: Source Cleanup | βœ… COMPLETE | Remove DuckDuckGo |
224
+ | Phase 10: ClinicalTrials | βœ… COMPLETE | ClinicalTrials.gov API |
225
+ | Phase 11: bioRxiv | βœ… COMPLETE | Preprint search |
226
+ | Phase 12: MCP Server | πŸ“ SPEC READY | MCP protocol integration |
227
+ | Phase 13: Modal Pipeline | πŸ“ SPEC READY | Sandboxed code execution |
228
+ | Phase 14: Demo & Submit | πŸ“ SPEC READY | Hackathon submission |
229
+
230
+ *Phases 1-11 COMPLETE. Phases 12-14 for hackathon compliance.*
231
+
232
+ ---
233
+
234
+ ## Hackathon Prize Potential
235
+
236
+ | Award | Amount | Requirement | Phase |
237
+ |-------|--------|-------------|-------|
238
+ | Track 2: MCP in Action (1st) | $2,500 | MCP server working | 12 |
239
+ | Modal Innovation | $2,500 | Sandbox demo ready | 13 |
240
+ | LlamaIndex | $1,000 | Using RAG | βœ… Done |
241
+ | Community Choice | $1,000 | Great demo video | 14 |
242
+ | **Total Potential** | **$7,000** | | |
243
 
244
+ **Deadline: November 30, 2025 11:59 PM UTC**