DeepCritical / docs /guides /deployment.md
VibecoderMcSwaggins's picture
docs: update guides and add testing strategy documentation
b4aa4ad
|
raw
history blame
3.72 kB

Deployment Guide

Launching DeepCritical: Gradio, MCP, & Modal


Overview

DeepCritical is designed for a multi-platform deployment strategy to maximize hackathon impact:

  1. HuggingFace Spaces: Host the Gradio UI (User Interface).
  2. MCP Server: Expose research tools to Claude Desktop/Agents.
  3. Modal (Optional): Run heavy inference or local LLMs if API costs are prohibitive.

1. HuggingFace Spaces (Gradio UI)

Goal: A public URL where judges/users can try the research agent.

Prerequisites

  • HuggingFace Account
  • gradio installed (uv add gradio)

Steps

  1. Create Space:

    • Go to HF Spaces -> Create New Space.
    • SDK: Gradio.
    • Hardware: CPU Basic (Free) is sufficient (since we use APIs).
  2. Prepare Files:

    • Ensure app.py contains the Gradio interface construction.
    • Ensure requirements.txt or pyproject.toml lists all dependencies.
  3. Secrets:

    • Go to Space Settings -> Repository secrets.
    • Add ANTHROPIC_API_KEY (or your chosen LLM provider key).
    • Add BRAVE_API_KEY (for web search).
  4. Deploy:

    • Push code to the Space's git repo.
    • Watch "Build" logs.

Streaming Optimization

Ensure app.py uses generator functions for the chat interface to prevent timeouts:

# app.py
def predict(message, history):
    agent = ResearchAgent()
    for update in agent.research_stream(message):
        yield update

2. MCP Server Deployment

Goal: Allow other agents (like Claude Desktop) to use our PubMed/Research tools directly.

Local Usage (Claude Desktop)

  1. Install:

    uv sync
    
  2. Configure Claude Desktop: Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

    {
      "mcpServers": {
        "deepcritical": {
          "command": "uv",
          "args": ["run", "fastmcp", "run", "src/mcp_servers/pubmed_server.py"],
          "cwd": "/absolute/path/to/DeepCritical"
        }
      }
    }
    
  3. Restart Claude: You should see a πŸ”Œ icon indicating connected tools.

Remote Deployment (Smithery/Glama)

Target for "MCP Track" bonus points.

  1. Dockerize: Create a Dockerfile for the MCP server.

    FROM python:3.11-slim
    COPY . /app
    RUN pip install fastmcp httpx
    CMD ["fastmcp", "run", "src/mcp_servers/pubmed_server.py", "--transport", "sse"]
    

    Note: Use SSE transport for remote/HTTP servers.

  2. Deploy: Host on Fly.io or Railway.


3. Modal (GPU/Heavy Compute)

Goal: Run a local LLM (e.g., Llama-3-70B) or handle massive parallel searches if APIs are too slow/expensive.

Setup

  1. Install: uv add modal
  2. Auth: modal token new

Logic

Instead of calling Anthropic API, we call a Modal function:

# src/llm/modal_client.py
import modal

stub = modal.Stub("deepcritical-inference")

@stub.function(gpu="A100")
def generate_text(prompt: str):
    # Load vLLM or similar
    ...

When to use?

  • Hackathon Demo: Stick to Anthropic/OpenAI APIs for speed/reliability.
  • Production/Stretch: Use Modal if you hit rate limits or want to show off "Open Source Models" capability.

Deployment Checklist

Pre-Flight

  • Run pytest -m unit to ensure logic is sound.
  • Run pytest -m e2e (one pass) to verify APIs connect.
  • Check requirements.txt matches pyproject.toml.

Secrets Management

  • NEVER commit .env files.
  • Verify keys are added to HF Space settings.

Post-Launch

  • Test the live URL.
  • Verify "Stop" button in Gradio works (interrupts the agent).
  • Record a walkthrough video (crucial for hackathon submission).