Spaces:
Running
Deployment Guide
Launching DeepCritical: Gradio, MCP, & Modal
Overview
DeepCritical is designed for a multi-platform deployment strategy to maximize hackathon impact:
- HuggingFace Spaces: Host the Gradio UI (User Interface).
- MCP Server: Expose research tools to Claude Desktop/Agents.
- Modal (Optional): Run heavy inference or local LLMs if API costs are prohibitive.
1. HuggingFace Spaces (Gradio UI)
Goal: A public URL where judges/users can try the research agent.
Prerequisites
- HuggingFace Account
gradioinstalled (uv add gradio)
Steps
Create Space:
- Go to HF Spaces -> Create New Space.
- SDK: Gradio.
- Hardware: CPU Basic (Free) is sufficient (since we use APIs).
Prepare Files:
- Ensure
app.pycontains the Gradio interface construction. - Ensure
requirements.txtorpyproject.tomllists all dependencies.
- Ensure
Secrets:
- Go to Space Settings -> Repository secrets.
- Add
ANTHROPIC_API_KEY(or your chosen LLM provider key). - Add
BRAVE_API_KEY(for web search).
Deploy:
- Push code to the Space's git repo.
- Watch "Build" logs.
Streaming Optimization
Ensure app.py uses generator functions for the chat interface to prevent timeouts:
# app.py
def predict(message, history):
agent = ResearchAgent()
for update in agent.research_stream(message):
yield update
2. MCP Server Deployment
Goal: Allow other agents (like Claude Desktop) to use our PubMed/Research tools directly.
Local Usage (Claude Desktop)
Install:
uv syncConfigure Claude Desktop: Edit
~/Library/Application Support/Claude/claude_desktop_config.json:{ "mcpServers": { "deepcritical": { "command": "uv", "args": ["run", "fastmcp", "run", "src/mcp_servers/pubmed_server.py"], "cwd": "/absolute/path/to/DeepCritical" } } }Restart Claude: You should see a π icon indicating connected tools.
Remote Deployment (Smithery/Glama)
Target for "MCP Track" bonus points.
Dockerize: Create a
Dockerfilefor the MCP server.FROM python:3.11-slim COPY . /app RUN pip install fastmcp httpx CMD ["fastmcp", "run", "src/mcp_servers/pubmed_server.py", "--transport", "sse"]Note: Use SSE transport for remote/HTTP servers.
Deploy: Host on Fly.io or Railway.
3. Modal (GPU/Heavy Compute)
Goal: Run a local LLM (e.g., Llama-3-70B) or handle massive parallel searches if APIs are too slow/expensive.
Setup
- Install:
uv add modal - Auth:
modal token new
Logic
Instead of calling Anthropic API, we call a Modal function:
# src/llm/modal_client.py
import modal
stub = modal.Stub("deepcritical-inference")
@stub.function(gpu="A100")
def generate_text(prompt: str):
# Load vLLM or similar
...
When to use?
- Hackathon Demo: Stick to Anthropic/OpenAI APIs for speed/reliability.
- Production/Stretch: Use Modal if you hit rate limits or want to show off "Open Source Models" capability.
Deployment Checklist
Pre-Flight
- Run
pytest -m unitto ensure logic is sound. - Run
pytest -m e2e(one pass) to verify APIs connect. - Check
requirements.txtmatchespyproject.toml.
Secrets Management
- NEVER commit
.envfiles. - Verify keys are added to HF Space settings.
Post-Launch
- Test the live URL.
- Verify "Stop" button in Gradio works (interrupts the agent).
- Record a walkthrough video (crucial for hackathon submission).