mcp-server

Paused

Client	Behavior	Problem with Stateful
OpenAI	Deletes session after listing tools, then reuses ID	Session not found errors
Cursor	Sends Bearer token with every request	Expects stateless behavior
Claude	Can work with either model	No issues

The Solution

# Pure stateless operation - no session persistence
mcp = FastMCP("wandb-mcp-server", stateless_http=True)

With this approach:

Session IDs are correlation IDs only - they match requests to responses
No state persists between requests - each request is independent
Authentication required per request - Bearer token must be included
Any worker can handle any request - enables horizontal scaling

Stateless HTTP Design

Architecture Overview

┌─────────────────────────────────────┐
│    MCP Clients (OpenAI/Cursor/etc)  │
│     Bearer Token with Each Request   │
└─────────────┬───────────────────────┘
              │ HTTPS
┌─────────────▼───────────────────────┐
│         Load Balancer (Optional)     │
│      Round-Robin Distribution        │
└──┬──────────┬──────────┬────────────┘
   │          │          │
┌──▼───┐  ┌──▼───┐  ┌──▼───┐
│ W1   │  │ W2   │  │ W3   │  (Multiple Workers Possible)
│      │  │      │  │      │
│ ASGI │  │ ASGI │  │ ASGI │  Uvicorn/Gunicorn
└──┬───┘  └──┬───┘  └──┬───┘
   │          │          │
┌──▼──────────▼──────────▼────────────┐
│         FastAPI Application         │
│  ┌────────────────────────────┐     │
│  │  Stateless Auth Middleware  │     │
│  │  (Bearer Token Validation)  │     │
│  └────────────────────────────┘     │
│  ┌────────────────────────────┐     │
│  │    MCP Stateless Handler    │     │
│  │  (No Session Storage)       │     │
│  └────────────────────────────┘     │
└─────────────┬───────────────────────┘
              │
┌─────────────▼───────────────────────┐
│         W&B API Integration         │
└─────────────────────────────────────┘

Request Flow

Client sends request with Bearer token and session ID
Middleware validates Bearer token
MCP processes request (session ID used for correlation only)
Response sent with matching session ID
No state persisted - request complete

Key Implementation Details

async def thread_safe_auth_middleware(request: Request, call_next):
    """Stateless authentication middleware."""
    
    # Session IDs are correlation IDs only
    session_id = request.headers.get("Mcp-Session-Id")
    if session_id:
        logger.debug(f"Correlation ID: {session_id[:8]}...")
    
    # Every request must have Bearer token
    authorization = request.headers.get("Authorization", "")
    if authorization.startswith("Bearer "):
        api_key = authorization[7:].strip()
        # Use API key for this request only
        # No session storage or retrieval

Performance & Scalability

Single Worker Performance

Based on testing with stateless mode:

Metric	Local Server	Remote (HF Spaces)
Max Concurrent	1000 clients	500+ clients
Throughput	~50-60 req/s	~35 req/s
Latency (p50)	<500ms	<2s
Memory Usage	200-500MB	300-600MB

Horizontal Scaling Potential

With stateless mode, the server supports true horizontal scaling:

Workers	Max Concurrent	Total Throughput	Notes
1	1000	~50 req/s	Current deployment
2	2000	~100 req/s	Linear scaling
4	4000	~200 req/s	Near-linear
8	8000	~400 req/s	Some overhead

Key Advantage: No session affinity required - any worker can handle any request!

Load Test Results

Latest Test Results (2025-09-25)

Local Server (MacOS, Single Worker)

Concurrent Clients	Success Rate	Throughput	Mean Response
10	100%	47 req/s	89ms
100	100%	47 req/s	1.2s
500	100%	56 req/s	4.4s
1000	100%	48 req/s	9.3s
1500	80%	51 req/s	15.4s
2000	70%	53 req/s	20.8s

Breaking Point: ~1500 concurrent connections

Remote Server (mcp.withwandb.com)

Concurrent Clients	Success Rate	Throughput	Mean Response
10	100%	10 req/s	0.8s
50	100%	29 req/s	1.2s
100	100%	33 req/s	1.9s
200	100%	34 req/s	3.3s
500	100%	35 req/s	7.5s

Key Finding: Remote server handles 500+ concurrent connections reliably!

Performance Sweet Spots

Low Latency (<1s response): Use ≤50 concurrent connections
Balanced (good throughput & latency): Use 100-200 concurrent connections
Maximum Throughput: Use 200-300 concurrent connections
Maximum Capacity: Up to 500 concurrent (remote) or 1000 (local)

Deployment Recommendations

Current Deployment (HuggingFace Spaces)

Configuration:
  - Single worker (can be increased)
  - Stateless HTTP mode
  - 2 vCPU, 16GB RAM
  - Port 7860

Performance:
  - 500+ concurrent connections
  - ~35 req/s throughput
  - 100% reliability up to 500 concurrent

Scaling Options

Option 1: Vertical Scaling

Increase CPU/RAM on HuggingFace Spaces
Can improve single-worker throughput

Option 2: Horizontal Scaling (Recommended)

# app.py - Enable multiple workers
uvicorn.run(app, host="0.0.0.0", port=PORT, workers=4)

Option 3: Multi-Region Deployment

Deploy to multiple regions
Use global load balancer
Reduce latency for users worldwide

Production Checklist

✅ Stateless mode enabled (stateless_http=True)
✅ Bearer authentication on every request
✅ Health check endpoint (/health)
✅ Monitoring for response times and errors
✅ Rate limiting (recommended: 100 req/s per client)
✅ Connection limits (recommended: 500 concurrent)

Configuration Example

# Production configuration
mcp = FastMCP("wandb-mcp-server", stateless_http=True)

# Uvicorn with multiple workers (if needed)
if __name__ == "__main__":
    uvicorn.run(
        app,
        host="0.0.0.0",
        port=7860,
        workers=1,  # Increase for horizontal scaling
        limit_concurrency=1000,  # Connection limit
        timeout_keep_alive=30,  # Keepalive timeout
    )

Security Considerations

API Key Validation: Every request validates Bearer token
No Session Storage: No risk of session hijacking
Rate Limiting: Protect against abuse
HTTPS Only: Always use TLS in production
Token Rotation: Encourage regular API key rotation

Summary

The W&B MCP Server's stateless architecture provides:

Universal Compatibility: Works with all MCP clients
Excellent Performance: 500+ concurrent connections, ~35 req/s
Horizontal Scalability: Add workers to increase capacity
Simple Operations: No session management complexity
Production Ready: Deployed and tested at scale

The stateless design is not a compromise - it's the optimal architecture for MCP servers in production environments.

W&B MCP Server - Architecture & Scalability Guide

Table of Contents

Architecture Decision

Decision: Pure Stateless HTTP Mode

Why Stateless?