Spaces:

rajkumarrawal
/

Secure-AI-Agents-Suite

Sleeping

App Files Files Community

Secure-AI-Agents-Suite / orchestration_platform /docs /deployment.md

rajkumarrawal

Initial commit

2ec0d39 15 days ago

preview code

raw

history blame contribute delete

32.9 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Deployment Guide

Comprehensive deployment guide for the MCP Orchestration Platform across different environments and platforms.

Prerequisites
Environment Setup
Local Development
Docker Deployment
Kubernetes Deployment
Cloud Platform Deployment
Production Configuration
Monitoring and Logging
Security Configuration
Troubleshooting

Prerequisites

System Requirements

Minimum Requirements:

CPU: 2 cores
RAM: 4GB
Storage: 20GB SSD
Network: 100 Mbps

Recommended Production Requirements:

CPU: 4+ cores
RAM: 8GB+
Storage: 50GB+ NVMe SSD
Network: 1 Gbps

Software Dependencies

Required:

Python 3.8+
pip (Python package manager)
git (for cloning repository)

Optional (depending on deployment):

Docker 20.10+
Docker Compose 2.0+
kubectl (for Kubernetes)
Terraform (for infrastructure as code)

Infrastructure Dependencies

Database:

PostgreSQL 12+ (recommended)
Redis 6.0+ (for caching)
Optional: MongoDB (for audit logs)

Monitoring:

Prometheus (metrics collection)
Grafana (dashboard visualization)
ELK Stack (log aggregation)

Security:

HashiCorp Vault (enterprise secrets management)
AWS Secrets Manager (cloud deployment)
TLS certificates

Environment Setup

Development Environment

Clone the repository

git clone https://github.com/your-org/mcp-orchestration-platform.git
cd mcp-orchestration-platform/orchestration_platform

Create virtual environment

python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

Install dependencies

pip install -r requirements.txt
pip install -r requirements-dev.txt  # For development

Set up environment variables

cp .env.example .env
# Edit .env with your configuration

Initialize database

python -c "from orchestration_platform.mcp_orchestrator import MCPOrchestrator; import asyncio; asyncio.run(MCPOrchestrator().initialize())"

Testing the Setup

# Run tests
python -m pytest test_orchestrator.py

# Run demo application
python demo.py

Local Development

Quick Start

Start required services

# Start PostgreSQL and Redis
docker-compose up -d postgres redis

# Or use local installations
sudo service postgresql start
sudo service redis-server start

Run the orchestrator

python demo.py

Start sample servers (separate terminals)

# Terminal 1: Weather server
python sample_servers/weather_server.py

# Terminal 2: CRM server  
python sample_servers/crm_server.py

Development Configuration

Create .env file:

# Core Configuration
ORCHESTRATOR_HOST=localhost
ORCHESTRATOR_PORT=7860
LOG_LEVEL=DEBUG

# Database
DATABASE_URL=postgresql://postgres:password@localhost:5432/orchestrator_dev
CACHE_URL=redis://localhost:6379

# Security
JWT_SECRET=your-development-secret-key
ENCRYPTION_KEY=your-development-encryption-key

# Secrets (Development)
SECRETS_BACKEND=local
SECRETS_ENCRYPTION_KEY=dev-encryption-key

# Monitoring
PROMETHEUS_ENABLED=true
METRICS_PORT=9090

Hot Reloading

For development with auto-reload:

pip install watchdog
watchmedo auto-restart --patterns="*.py" --recursive -- python demo.py

Docker Deployment

Single Container Deployment

Build image

FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Create non-root user
RUN useradd -m -u 1000 orchestrator
USER orchestrator

# Expose port
EXPOSE 7860

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:7860/health/ready || exit 1

# Run application
CMD ["python", "demo.py"]

Build and run

docker build -t mcp-orchestrator:latest .
docker run -p 7860:7860 --env-file .env mcp-orchestrator:latest

Docker Compose Deployment

Create docker-compose.yml

version: '3.8'

services:
  orchestrator:
    build: .
    ports:
      - "7860:7860"
    environment:
      - DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@postgres:5432/orchestrator
      - CACHE_URL=redis://redis:6379
      - SECRETS_BACKEND=vault
      - VAULT_ADDR=http://vault:8200
    depends_on:
      - postgres
      - redis
      - vault
    volumes:
      - ./logs:/app/logs
      - ./config:/app/config
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:7860/health/ready"]
      interval: 30s
      timeout: 10s
      retries: 3

  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=orchestrator
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data
    restart: unless-stopped

  vault:
    image: hashicorp/vault:latest
    cap_add:
      - IPC_LOCK
    environment:
      - VAULT_DEV_ROOT_TOKEN_ID=dev-root-token
      - VAULT_DEV_LISTEN_ADDRESS=0.0.0.0:8200
    ports:
      - "8200:8200"
    restart: unless-stopped

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
      - ./grafana/datasources:/etc/grafana/provisioning/datasources
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:
  prometheus_data:
  grafana_data:

networks:
  default:
    driver: bridge

Create environment file

# .env
POSTGRES_PASSWORD=secure-password-here
GRAFANA_PASSWORD=admin-password-here
VAULT_TOKEN=dev-root-token

Deploy with Docker Compose

docker-compose up -d

Verify deployment

docker-compose ps
curl http://localhost:7860/health/ready
curl http://localhost:3000  # Grafana

Production Docker Configuration

Use multi-stage build for optimization

# Build stage
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Runtime stage
FROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

# Copy installed packages
COPY --from=builder /root/.local /root/.local
COPY --chown=1000:1000 . .

USER 1000
EXPOSE 7860

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:7860/health/ready || exit 1

CMD ["python", "demo.py"]

Security optimizations

# Run as non-root user
USER 1000

# Remove unnecessary packages
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

# Use read-only filesystem where possible
VOLUME ["/app/logs", "/app/config"]

Kubernetes Deployment

Basic Deployment

Create namespace

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: mcp-orchestrator

Create ConfigMap

# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: orchestrator-config
  namespace: mcp-orchestrator
data:
  ORCHESTRATOR_HOST: "0.0.0.0"
  ORCHESTRATOR_PORT: "7860"
  LOG_LEVEL: "INFO"
  PROMETHEUS_ENABLED: "true"
  METRICS_PORT: "9090"

Create Secret

# secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: orchestrator-secrets
  namespace: mcp-orchestrator
type: Opaque
data:
  DATABASE_URL: cG9zdGdyZXNxbDovL3VzZXI6cGFzc3dvcmRAcG9zdGdyZXM6NTQzMi9vcmNoZXN0cmF0b3I=  # base64 encoded
  JWT_SECRET: eW91ci1qd3Qtc2VjcmV0LWtleQ==  # base64 encoded
  ENCRYPTION_KEY: eW91ci1lbmNyeXB0aW9uLWtleQ==  # base64 encoded

Create Deployment

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: orchestrator
  namespace: mcp-orchestrator
  labels:
    app: mcp-orchestrator
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: mcp-orchestrator
  template:
    metadata:
      labels:
        app: mcp-orchestrator
    spec:
      containers:
      - name: orchestrator
        image: mcp-orchestrator:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 7860
          name: http
        - containerPort: 9090
          name: metrics
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: orchestrator-secrets
              key: DATABASE_URL
        - name: JWT_SECRET
          valueFrom:
            secretKeyRef:
              name: orchestrator-secrets
              key: JWT_SECRET
        - name: ENCRYPTION_KEY
          valueFrom:
            secretKeyRef:
              name: orchestrator-secrets
              key: ENCRYPTION_KEY
        envFrom:
        - configMapRef:
            name: orchestrator-config
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health/live
            port: 7860
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 7860
          initialDelaySeconds: 5
          periodSeconds: 5
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
        - name: logs-volume
          mountPath: /app/logs
      volumes:
      - name: config-volume
        configMap:
          name: orchestrator-config
      - name: logs-volume
        emptyDir: {}
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000

Create Service

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: orchestrator-service
  namespace: mcp-orchestrator
  labels:
    app: mcp-orchestrator
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 7860
    protocol: TCP
    name: http
  - port: 9090
    targetPort: 9090
    protocol: TCP
    name: metrics
  selector:
    app: mcp-orchestrator

Create Ingress

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: orchestrator-ingress
  namespace: mcp-orchestrator
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
  tls:
  - hosts:
    - orchestrator.yourdomain.com
    secretName: orchestrator-tls
  rules:
  - host: orchestrator.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: orchestrator-service
            port:
              number: 80

Deploy to Kubernetes

# Apply all resources
kubectl apply -f namespace.yaml
kubectl apply -f configmap.yaml
kubectl apply -f secret.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml

# Verify deployment
kubectl get pods -n mcp-orchestrator
kubectl get services -n mcp-orchestrator
kubectl get ingress -n mcp-orchestrator

Helm Chart Deployment

Create Helm chart structure

helm create mcp-orchestrator

Configure values.yaml

# values.yaml
replicaCount: 3

image:
  repository: mcp-orchestrator
  tag: latest
  pullPolicy: Always

service:
  type: ClusterIP
  port: 80
  targetPort: 7860

ingress:
  enabled: true
  className: nginx
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: orchestrator.yourdomain.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: orchestrator-tls
      hosts:
        - orchestrator.yourdomain.com

resources:
  limits:
    cpu: 500m
    memory: 1Gi
  requests:
    cpu: 250m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

nodeSelector: {}
tolerations: []
affinity: {}

config:
  ORCHESTRATOR_HOST: "0.0.0.0"
  ORCHESTRATOR_PORT: "7860"
  LOG_LEVEL: "INFO"
  PROMETHEUS_ENABLED: "true"
  METRICS_PORT: "9090"

Deploy with Helm

# Install
helm install orchestrator ./mcp-orchestrator -n mcp-orchestrator

# Upgrade
helm upgrade orchestrator ./mcp-orchestrator -n mcp-orchestrator

# Uninstall
helm uninstall orchestrator -n mcp-orchestrator

Cloud Platform Deployment

AWS Deployment

ECS with Fargate

Create task definition

{
  "family": "mcp-orchestrator",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "orchestrator",
      "image": "ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest",
      "portMappings": [
        {
          "containerPort": 7860,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "ORCHESTRATOR_HOST",
          "value": "0.0.0.0"
        },
        {
          "name": "ORCHESTRATOR_PORT", 
          "value": "7860"
        }
      ],
      "secrets": [
        {
          "name": "DATABASE_URL",
          "valueFrom": "arn:aws:ssm:REGION:ACCOUNT:parameter/orchestrator/database-url"
        },
        {
          "name": "JWT_SECRET",
          "valueFrom": "arn:aws:ssm:REGION:ACCOUNT:parameter/orchestrator/jwt-secret"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/mcp-orchestrator",
          "awslogs-region": "REGION",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

Deploy with CloudFormation

# cloudformation-template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: 'MCP Orchestrator Platform'

Parameters:
  DatabasePassword:
    Type: String
    NoEcho: true
    Description: 'Database password'

Resources:
  # ECR Repository
  ECRRepository:
    Type: AWS::ECR::Repository
    Properties:
      RepositoryName: mcp-orchestrator

  # ECS Cluster
  ECSCluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterName: mcp-orchestrator-cluster

  # Task Definition
  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: mcp-orchestrator
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - FARGATE
      Cpu: 512
      Memory: 1024
      ExecutionRoleArn: !Ref ECSExecutionRole
      TaskRoleArn: !Ref ECSTaskRole
      ContainerDefinitions:
        - Name: orchestrator
          Image: !Sub '${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/mcp-orchestrator:latest'
          PortMappings:
            - ContainerPort: 7860
          Environment:
            - Name: ORCHESTRATOR_HOST
              Value: 0.0.0.0
            - Name: ORCHESTRATOR_PORT
              Value: '7860'
          Secrets:
            - Name: DATABASE_URL
              ValueFrom: !Ref DatabaseSecret
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Ref CloudWatchLogsGroup
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: ecs

  # Service
  ECSService:
    Type: AWS::ECS::Service
    Properties:
      ServiceName: mcp-orchestrator-service
      Cluster: !Ref ECSCluster
      TaskDefinition: !Ref TaskDefinition
      DesiredCount: 2
      LaunchType: FARGATE
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: ENABLED
          SecurityGroups:
            - !Ref ECSSecurityGroup
          Subnets:
            - !Ref PublicSubnet1
            - !Ref PublicSubnet2
      LoadBalancers:
        - ContainerName: orchestrator
          ContainerPort: 7860
          TargetGroupArn: !Ref TargetGroup

  # Load Balancer
  ApplicationLoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Name: mcp-orchestrator-alb
      Scheme: internet-facing
      Type: application
      SecurityGroups:
        - !Ref ALBSecurityGroup
      Subnets:
        - !Ref PublicSubnet1
        - !Ref PublicSubnet2

  # Target Group
  TargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Name: mcp-orchestrator-tg
      Port: 7860
      Protocol: HTTP
      VpcId: !Ref VPC
      TargetGroupAttributes:
        - Key: deregistration_delay.timeout_seconds
          Value: 30

  # Listener
  Listener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      DefaultActions:
        - Type: forward
          TargetGroupArn: !Ref TargetGroup
      LoadBalancerArn: !Ref ApplicationLoadBalancer
      Port: 80
      Protocol: HTTP

Outputs:
  ServiceURL:
    Value: !GetAtt ApplicationLoadBalancer.DNSName
    Description: URL for the MCP Orchestrator service

Deploy

# Build and push image
aws ecr get-login-password --region REGION | docker login --username AWS --password-stdin ACCOUNT.dkr.ecr.REGION.amazonaws.com
docker build -t mcp-orchestrator .
docker tag mcp-orchestrator:latest ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest
docker push ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest

# Deploy with CloudFormation
aws cloudformation deploy \
  --template-file cloudformation-template.yaml \
  --stack-name mcp-orchestrator \
  --parameter-overrides DatabasePassword=your-secure-password \
  --capabilities CAPABILITY_IAM

AWS EKS Deployment

# eks-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-orchestrator
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mcp-orchestrator
  template:
    metadata:
      labels:
        app: mcp-orchestrator
    spec:
      containers:
      - name: orchestrator
        image: ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest
        ports:
        - containerPort: 7860
        env:
        - name: ORCHESTRATOR_HOST
          value: "0.0.0.0"
        - name: ORCHESTRATOR_PORT
          value: "7860"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: orchestrator-secrets
              key: database-url
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
  name: mcp-orchestrator-service
spec:
  selector:
    app: mcp-orchestrator
  ports:
  - port: 80
    targetPort: 7860
  type: LoadBalancer

Azure Container Instances

Create resource group

az group create --name mcp-orchestrator-rg --location eastus

Deploy container

az container create \
  --resource-group mcp-orchestrator-rg \
  --name mcp-orchestrator \
  --image mcp-orchestrator:latest \
  --cpu 2 \
  --memory 4 \
  --ports 7860 \
  --environment-variables \
    ORCHESTRATOR_HOST=0.0.0.0 \
    ORCHESTRATOR_PORT=7860 \
    LOG_LEVEL=INFO \
  --secure-environment-variables \
    DATABASE_URL=postgresql://user:pass@server:5432/db \
    JWT_SECRET=your-jwt-secret \
  --restart-policy Always

Create Azure Database for PostgreSQL

az postgres server create \
  --resource-group mcp-orchestrator-rg \
  --name mcp-orchestrator-db \
  --location eastus \
  --admin-user orchestrator \
  --admin-password secure-password \
  --sku-name B_Gen5_1

Google Cloud Run Deployment

Build and push image

gcloud builds submit --tag gcr.io/PROJECT-ID/mcp-orchestrator

Deploy to Cloud Run

gcloud run deploy mcp-orchestrator \
  --image gcr.io/PROJECT-ID/mcp-orchestrator \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --port 7860 \
  --memory 1Gi \
  --cpu 2 \
  --set-env-vars ORCHESTRATOR_HOST=0.0.0.0,ORCHESTRATOR_PORT=7860,LOG_LEVEL=INFO \
  --set-secrets DATABASE_URL=mcp-orchestrator-db-url:latest \
  --set-secrets JWT_SECRET=mcp-orchestrator-jwt-secret:latest

Production Configuration

Environment Variables

# Core Application
ORCHESTRATOR_HOST=0.0.0.0
ORCHESTRATOR_PORT=7860
LOG_LEVEL=INFO
DEBUG=false

# Database Configuration
DATABASE_URL=postgresql://user:password@host:5432/database
DATABASE_POOL_SIZE=20
DATABASE_MAX_OVERFLOW=30
DATABASE_POOL_TIMEOUT=30

# Cache Configuration  
CACHE_URL=redis://redis:6379/0
CACHE_POOL_SIZE=20
CACHE_TTL=3600

# Security
JWT_SECRET=your-super-secure-jwt-secret-key
ENCRYPTION_KEY=your-32-byte-encryption-key
SECRET_KEY_ROTATION_DAYS=90
SESSION_TTL=3600
MAX_SESSIONS=10000

# Secrets Management
SECRETS_BACKEND=vault  # local, vault, aws, environment
VAULT_ADDR=http://vault:8200
VAULT_TOKEN=your-vault-token
AWS_REGION=us-east-1

# Rate Limiting
RATE_LIMIT_REQUESTS=1000
RATE_LIMIT_WINDOW=3600
RATE_LIMIT_STORAGE=redis

# Monitoring
PROMETHEUS_ENABLED=true
METRICS_PORT=9090
HEALTH_CHECK_INTERVAL=30
METRICS_RETENTION_DAYS=30

# Performance
MAX_CONNECTIONS=200
CONNECTION_TIMEOUT=30
REQUEST_TIMEOUT=60
MAX_RETRIES=3
CIRCUIT_BREAKER_FAILURE_THRESHOLD=5
CIRCUIT_BREAKER_RECOVERY_TIMEOUT=60

# SSL/TLS
SSL_ENABLED=true
SSL_CERT_PATH=/app/certs/orchestrator.crt
SSL_KEY_PATH=/app/certs/orchestrator.key
SSL_VERIFY=true

# CORS
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
CORS_METHODS=GET,POST,PUT,DELETE,OPTIONS
CORS_HEADERS=Content-Type,Authorization,X-Requested-With

# Feature Flags
FEATURE_REAL_TIME_UPDATES=true
FEATURE_ADVANCED_ANALYTICS=true
FEATURE_PLUGIN_SYSTEM=true

Database Configuration

PostgreSQL Optimization

-- postgresql.conf
shared_buffers = 256MB
effective_cache_size = 1GB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200

Redis Configuration

# redis.conf
maxmemory 512mb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes

Nginx Reverse Proxy

# /etc/nginx/sites-available/mcp-orchestrator
upstream orchestrator_backend {
    server orchestrator1:7860 weight=3 max_fails=3 fail_timeout=30s;
    server orchestrator2:7860 weight=3 max_fails=3 fail_timeout=30s;
    server orchestrator3:7860 weight=3 max_fails=3 fail_timeout=30s backup;
}

server {
    listen 80;
    server_name orchestrator.yourdomain.com;
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name orchestrator.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/orchestrator.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/orchestrator.yourdomain.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384;
    ssl_prefer_server_ciphers off;

    client_max_body_size 50M;
    client_body_timeout 60s;
    client_header_timeout 60s;

    location / {
        proxy_pass http://orchestrator_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;
        proxy_read_timeout 300s;
        proxy_connect_timeout 75s;
    }

    location /metrics {
        proxy_pass http://orchestrator_backend:9090/metrics;
        allow 127.0.0.1;
        allow 10.0.0.0/8;
        allow 172.16.0.0/12;
        allow 192.168.0.0/16;
        deny all;
    }

    location /health {
        proxy_pass http://orchestrator_backend/health;
        access_log off;
    }
}

Monitoring and Logging

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "orchestrator_alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'mcp-orchestrator'
    static_configs:
      - targets: ['orchestrator:9090']
    metrics_path: /metrics
    scrape_interval: 10s
    scrape_timeout: 5s

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

Grafana Dashboards

Orchestrator Overview Dashboard

{
  "dashboard": {
    "title": "MCP Orchestrator Overview",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(orchestrator_requests_total[5m])",
            "legendFormat": "{{method}} {{status}}"
          }
        ]
      },
      {
        "title": "Response Time",
        "type": "graph", 
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(orchestrator_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.50, rate(orchestrator_request_duration_seconds_bucket[5m]))",
            "legendFormat": "50th percentile"
          }
        ]
      },
      {
        "title": "Active Connections",
        "type": "singlestat",
        "targets": [
          {
            "expr": "orchestrator_active_connections"
          }
        ]
      }
    ]
  }
}

Structured Logging

import structlog

# Configure structured logging
structlog.configure(
    processors=[
        structlog.stdlib.filter_by_level,
        structlog.stdlib.add_logger_name,
        structlog.stdlib.add_log_level,
        structlog.stdlib.PositionalArgumentsFormatter(),
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.StackInfoRenderer(),
        structlog.processors.format_exc_info,
        structlog.processors.UnicodeDecoder(),
        structlog.processors.JSONRenderer()
    ],
    context_class=dict,
    logger_factory=structlog.stdlib.LoggerFactory(),
    wrapper_class=structlog.stdlib.BoundLogger,
    cache_logger_on_first_use=True,
)

Security Configuration

TLS/SSL Setup

Generate self-signed certificates (development)

openssl req -x509 -newkey rsa:4096 -keyout orchestrator.key -out orchestrator.crt -days 365 -nodes

Let's Encrypt certificates (production)

certbot certonly --standalone -d orchestrator.yourdomain.com

Security Headers

# security_headers.py
from starlette.middleware.cors import CORSMiddleware
from starlette.middleware.sessions import SessionMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://yourdomain.com"],
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "DELETE"],
    allow_headers=["Authorization", "Content-Type"],
)

# Add security headers
@app.middleware("http")
async def add_security_headers(request, call_next):
    response = await call_next(request)
    response.headers["X-Content-Type-Options"] = "nosniff"
    response.headers["X-Frame-Options"] = "DENY"
    response.headers["X-XSS-Protection"] = "1; mode=block"
    response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains"
    response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
    return response

Authentication

# auth.py
import jwt
from datetime import datetime, timedelta

def create_access_token(data: dict, expires_delta: timedelta = None):
    to_encode = data.copy()
    if expires_delta:
        expire = datetime.utcnow() + expires_delta
    else:
        expire = datetime.utcnow() + timedelta(minutes=15)
    to_encode.update({"exp": expire})
    encoded_jwt = jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)
    return encoded_jwt

def verify_token(token: str):
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
        return payload
    except jwt.PyJWTError:
        return None

Troubleshooting

Common Deployment Issues

1. Pod CrashLoopBackOff

# Check pod logs
kubectl logs -f pod-name -n mcp-orchestrator

# Check events
kubectl get events -n mcp-orchestrator --sort-by='.lastTimestamp'

# Debug pod
kubectl debug -it pod-name -n mcp-orchestrator --image=busybox

2. Database Connection Issues

# Test database connectivity
kubectl exec -it pod-name -n mcp-orchestrator -- python -c "
import asyncpg
import asyncio
async def test():
    try:
        conn = await asyncpg.connect('postgresql://user:pass@host:5432/db')
        await conn.execute('SELECT 1')
        print('Database connection successful')
        await conn.close()
    except Exception as e:
        print(f'Database connection failed: {e}')
asyncio.run(test())
"

3. Memory Issues

# Check resource usage
kubectl top pods -n mcp-orchestrator

# Check node resources
kubectl top nodes

# Increase memory limits
kubectl patch deployment orchestrator -n mcp-orchestrator -p '{"spec":{"template":{"spec":{"containers":[{"name":"orchestrator","resources":{"limits":{"memory":"2Gi"}}}]}}}}'

Performance Tuning

1. Connection Pool Optimization

# Tune connection pool settings
DATABASE_POOL_SIZE=20    # Increase for high load
DATABASE_MAX_OVERFLOW=30  # Allow overflow connections
DATABASE_POOL_TIMEOUT=30  # Timeout for acquiring connection

2. Cache Optimization

# Redis configuration
CACHE_TTL=3600           # Adjust based on use case
CACHE_COMPRESSION=true   # Enable for large responses

3. Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: orchestrator-hpa
  namespace: mcp-orchestrator
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: orchestrator
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Health Checks

Application Health Check

# health_check.py
from fastapi import FastAPI
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST

app = FastAPI()

@app.get("/health/live")
async def liveness_check():
    return {"status": "alive"}

@app.get("/health/ready")
async def readiness_check():
    # Check database connectivity
    # Check cache connectivity  
    # Check external services
    return {"status": "ready"}

@app.get("/health/detailed")
async def detailed_health():
    return {
        "status": "healthy",
        "checks": {
            "database": await check_database(),
            "cache": await check_cache(),
            "external_services": await check_external_services()
        }
    }

This completes the comprehensive deployment guide. The platform can now be deployed across various environments with proper configuration, monitoring, and security measures in place.