Spaces:

cacodex
/

n2r-dev

Running

App Files Files Community

cacodex commited on Apr 11

Commit

c74679b

verified ·

1 Parent(s): 85201c3

Upload 14 files

Browse files

Files changed (14) hide show

.dockerignore +9 -0
.env.example +9 -0
Dockerfile +16 -0
README.md +108 -6
app/__init__.py +1 -0
app/__pycache__/__init__.cpython-313.pyc +0 -0
app/__pycache__/main.cpython-313.pyc +0 -0
app/main.py +1314 -0
requirements.txt +6 -0
static/admin.html +141 -0
static/admin.js +273 -0
static/index.html +42 -0
static/public.js +93 -0
static/style.css +455 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,9 @@

+__pycache__/
+.pytest_cache/
+.pytest_tmp/
+*.pyc
+*.pyo
+*.sqlite3
+.smoke.sqlite3
+uvicorn.log
+uvicorn.err.log

.env.example ADDED Viewed

	@@ -0,0 +1,9 @@

+PASSWORD=change-me
+SESSION_SECRET=change-me-too
+GATEWAY_API_KEY=
+NVIDIA_API_BASE=https://integrate.api.nvidia.com/v1
+NVIDIA_NIM_API_KEY=
+HEALTHCHECK_INTERVAL_MINUTES=60
+HEALTHCHECK_PROMPT=Reply with the single word OK.
+PUBLIC_HISTORY_HOURS=48
+DATABASE_PATH=./data.sqlite3

Dockerfile ADDED Viewed

	@@ -0,0 +1,16 @@

+FROM python:3.13-slim
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PORT=7860
+WORKDIR /app
+COPY requirements.txt ./
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,10 +1,112 @@
----
-title: N2r
-emoji: 🏢
-colorFrom: pink
-colorTo: purple
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: NVIDIA NIM Responses Gateway
 sdk: docker
+app_port: 7860
 pinned: false
 ---
+# NVIDIA NIM Responses Gateway
+A FastAPI gateway that converts NVIDIA NIM's official chat endpoint:
+`https://integrate.api.nvidia.com/v1/chat/completions`
+into an OpenAI-style `/v1/responses` interface, with:
+- tool calling / function calling passthrough
+- `previous_response_id` conversation chaining
+- `/v1/models` model listing
+- a public model health dashboard
+- an admin SPA for model management, NVIDIA NIM key management, health checks, and scheduler settings
+- Docker packaging for Hugging Face Spaces
+## Included NVIDIA models
+The app seeds these models on first startup:
+- `z-ai/glm5`
+- `minimaxai/minimax-m2.5`
+- `moonshotai/kimi-k2.5`
+- `deepseek-ai/deepseek-v3.2`
+- `google/gemma-4-31b-it`
+- `qwen/qwen3.5-397b-a17b`
+You can add or remove more models from the admin page.
+## Routes
+- `GET /` public health dashboard
+- `GET /admin` admin SPA
+- `GET /api/health/public` public hourly health data
+- `GET /v1/models` OpenAI-style model list
+- `POST /v1/responses` OpenAI-style responses endpoint
+- `GET /v1/responses/{response_id}` retrieve a stored response
+Admin API:
+- `POST /admin/api/login`
+- `GET /admin/api/overview`
+- `GET/POST/DELETE /admin/api/models...`
+- `GET/POST/DELETE /admin/api/keys...`
+- `GET /admin/api/healthchecks`
+- `POST /admin/api/healthchecks/run`
+- `GET/PUT /admin/api/settings`
+## Environment variables
+- `PASSWORD` required for admin login
+- `SESSION_SECRET` optional cookie signing secret; falls back to `PASSWORD`
+- `GATEWAY_API_KEY` optional bearer token to protect `/v1/models` and `/v1/responses`
+- `NVIDIA_API_BASE` defaults to `https://integrate.api.nvidia.com/v1`
+- `NVIDIA_NIM_API_KEY` optional bootstrap key inserted on first startup
+- `HEALTHCHECK_INTERVAL_MINUTES` default `60`
+- `HEALTHCHECK_PROMPT` default `Reply with the single word OK.`
+- `PUBLIC_HISTORY_HOURS` default `48`
+- `DATABASE_PATH` default `./data.sqlite3`
+A starter file is available at `.env.example`.
+## Local run
+Install runtime dependencies:
+```bash
+pip install -r requirements.txt
+```
+For local verification with the smoke script:
+```bash
+pip install -r requirements-dev.txt
+python scripts/local_smoke_test.py
+```
+Run the app:
+```bash
+uvicorn app.main:app --host 0.0.0.0 --port 7860
+```
+## Hugging Face Space deployment
+This repository is prepared as a Docker Space.
+1. Create a new Hugging Face Space with `SDK: Docker`.
+2. Push this repository to the Space.
+3. Add Space secrets for at least `PASSWORD` and one NVIDIA NIM key.
+4. Open `/admin`, add or verify the stored keys, then run health checks.
+## Notes on API compatibility
+- The gateway accepts OpenAI-style `input` payloads and converts them to chat-completions `messages`.
+- Function tools are mapped to NVIDIA NIM's OpenAI-compatible `tools` format.
+- Returned tool calls are exposed as `function_call` items inside the `output` array.
+- `stream: true` is supported as SSE, but the current implementation emits buffered response events after the upstream completion finishes.
+## References
+- OpenAI Responses API guide: https://platform.openai.com/docs/guides/responses-vs-chat-completions
+- OpenAI function calling guide: https://platform.openai.com/docs/guides/function-calling
+- NVIDIA Build portal: https://build.nvidia.com/
+- NVIDIA NIM API reference: https://docs.api.nvidia.com/

app/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """NVIDIA NIM to OpenAI Responses gateway."""

app/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (199 Bytes). View file

app/__pycache__/main.cpython-313.pyc ADDED Viewed

Binary file (79.4 kB). View file

app/main.py ADDED Viewed

	@@ -0,0 +1,1314 @@

+from __future__ import annotations
+import json
+import os
+import sqlite3
+import time
+import uuid
+from contextlib import asynccontextmanager
+from datetime import UTC, datetime, timedelta
+from pathlib import Path
+from typing import Any
+import httpx
+from apscheduler.schedulers.asyncio import AsyncIOScheduler
+from fastapi import Depends, FastAPI, Header, HTTPException, Request, Response, status
+from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
+from fastapi.staticfiles import StaticFiles
+from itsdangerous import BadSignature, SignatureExpired, URLSafeTimedSerializer
+BASE_DIR = Path(__file__).resolve().parent.parent
+STATIC_DIR = BASE_DIR / "static"
+DB_PATH = Path(os.getenv("DATABASE_PATH", BASE_DIR / "data.sqlite3"))
+RAW_NVIDIA_API_BASE = os.getenv("NVIDIA_API_BASE", os.getenv("NIM_BASE_URL", "https://integrate.api.nvidia.com/v1")).rstrip("/")
+NVIDIA_API_BASE = RAW_NVIDIA_API_BASE if RAW_NVIDIA_API_BASE.endswith("/v1") else f"{RAW_NVIDIA_API_BASE}/v1"
+CHAT_COMPLETIONS_URL = f"{NVIDIA_API_BASE}/chat/completions"
+MODELS_URL = f"{NVIDIA_API_BASE}/models"
+ADMIN_PASSWORD = os.getenv("PASSWORD")
+SESSION_SECRET = os.getenv("SESSION_SECRET") or ADMIN_PASSWORD or "nim-responses-dev-secret"
+COOKIE_NAME = os.getenv("COOKIE_NAME", "nim_admin_session")
+GATEWAY_API_KEY = os.getenv("GATEWAY_API_KEY")
+DEFAULT_ENV_KEY = os.getenv("NVIDIA_NIM_API_KEY") or os.getenv("NVIDIA_API_KEY")
+REQUEST_TIMEOUT_SECONDS = float(os.getenv("REQUEST_TIMEOUT_SECONDS", "90"))
+DEFAULT_HEALTH_INTERVAL_MINUTES = int(os.getenv("HEALTHCHECK_INTERVAL_MINUTES", "60"))
+DEFAULT_HEALTH_PROMPT = os.getenv("HEALTHCHECK_PROMPT", "Reply with the single word OK.")
+PUBLIC_HISTORY_HOURS = int(os.getenv("PUBLIC_HISTORY_HOURS", "48"))
+DEFAULT_MODELS = [
+    ("z-ai/glm5", "GLM-5", "Reasoning and general assistant model from Z.ai", 10, 1),
+    ("minimaxai/minimax-m2.5", "MiniMax M2.5", "Long-context assistant model from MiniMax", 20, 1),
+    ("moonshotai/kimi-k2.5", "Kimi K2.5", "Kimi family model tuned for tool use and code", 30, 1),
+    ("deepseek-ai/deepseek-v3.2", "DeepSeek V3.2", "DeepSeek production general-purpose model", 40, 1),
+    ("google/gemma-4-31b-it", "Gemma 4 31B IT", "Instruction-tuned Gemma model", 50, 0),
+    ("qwen/qwen3.5-397b-a17b", "Qwen 3.5 397B A17B", "Large-scale Qwen model with broad capabilities", 60, 0),
+]
+scheduler = AsyncIOScheduler(timezone="UTC")
+def utcnow() -> datetime:
+    return datetime.now(UTC)
+def utcnow_iso() -> str:
+    return utcnow().isoformat()
+def parse_datetime(value: str | None) -> datetime | None:
+    if not value:
+        return None
+    try:
+        return datetime.fromisoformat(value)
+    except ValueError:
+        return None
+def bool_value(value: Any) -> bool:
+    if isinstance(value, bool):
+        return value
+    if isinstance(value, (int, float)):
+        return bool(value)
+    if value is None:
+        return False
+    return str(value).strip().lower() in {"1", "true", "yes", "on", "enabled"}
+def json_dumps(value: Any) -> str:
+    return json.dumps(value, ensure_ascii=False)
+def get_db_connection() -> sqlite3.Connection:
+    conn = sqlite3.connect(DB_PATH, check_same_thread=False)
+    conn.row_factory = sqlite3.Row
+    return conn
+def init_db() -> None:
+    DB_PATH.parent.mkdir(parents=True, exist_ok=True)
+    conn = get_db_connection()
+    try:
+        conn.executescript(
+            """
+            CREATE TABLE IF NOT EXISTS proxy_models (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                model_id TEXT UNIQUE NOT NULL,
+                display_name TEXT NOT NULL,
+                provider TEXT NOT NULL DEFAULT 'nvidia-nim',
+                description TEXT,
+                enabled INTEGER NOT NULL DEFAULT 1,
+                featured INTEGER NOT NULL DEFAULT 0,
+                sort_order INTEGER NOT NULL DEFAULT 0,
+                request_count INTEGER NOT NULL DEFAULT 0,
+                success_count INTEGER NOT NULL DEFAULT 0,
+                failure_count INTEGER NOT NULL DEFAULT 0,
+                healthcheck_count INTEGER NOT NULL DEFAULT 0,
+                healthcheck_success_count INTEGER NOT NULL DEFAULT 0,
+                last_used_at TEXT,
+                last_healthcheck_at TEXT,
+                last_health_status INTEGER,
+                last_latency_ms REAL,
+                created_at TEXT NOT NULL,
+                updated_at TEXT NOT NULL
+            );
+            CREATE TABLE IF NOT EXISTS api_keys (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                name TEXT UNIQUE NOT NULL,
+                api_key TEXT NOT NULL,
+                enabled INTEGER NOT NULL DEFAULT 1,
+                request_count INTEGER NOT NULL DEFAULT 0,
+                success_count INTEGER NOT NULL DEFAULT 0,
+                failure_count INTEGER NOT NULL DEFAULT 0,
+                healthcheck_count INTEGER NOT NULL DEFAULT 0,
+                healthcheck_success_count INTEGER NOT NULL DEFAULT 0,
+                last_used_at TEXT,
+                last_tested_at TEXT,
+                last_latency_ms REAL,
+                created_at TEXT NOT NULL,
+                updated_at TEXT NOT NULL
+            );
+            CREATE TABLE IF NOT EXISTS response_records (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                response_id TEXT UNIQUE NOT NULL,
+                parent_response_id TEXT,
+                model_id INTEGER,
+                api_key_id INTEGER,
+                request_json TEXT NOT NULL,
+                input_items_json TEXT NOT NULL,
+                output_json TEXT NOT NULL,
+                output_items_json TEXT NOT NULL,
+                status TEXT NOT NULL,
+                created_at TEXT NOT NULL
+            );
+            CREATE TABLE IF NOT EXISTS health_check_records (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                model_id INTEGER NOT NULL,
+                api_key_id INTEGER,
+                ok INTEGER NOT NULL,
+                status_code INTEGER,
+                latency_ms REAL,
+                error_message TEXT,
+                response_excerpt TEXT,
+                checked_at TEXT NOT NULL
+            );
+            CREATE TABLE IF NOT EXISTS settings (
+                key TEXT PRIMARY KEY,
+                value TEXT NOT NULL
+            );
+            """
+        )
+        now = utcnow_iso()
+        for model_id, display_name, description, sort_order, featured in DEFAULT_MODELS:
+            conn.execute(
+                """
+                INSERT OR IGNORE INTO proxy_models (
+                    model_id, display_name, provider, description, enabled, featured, sort_order, created_at, updated_at
+                ) VALUES (?, ?, 'nvidia-nim', ?, 1, ?, ?, ?, ?)
+                """,
+                (model_id, display_name, description, featured, sort_order, now, now),
+            )
+        defaults = {
+            "healthcheck_enabled": "true",
+            "healthcheck_interval_minutes": str(DEFAULT_HEALTH_INTERVAL_MINUTES),
+            "healthcheck_prompt": DEFAULT_HEALTH_PROMPT,
+            "public_history_hours": str(PUBLIC_HISTORY_HOURS),
+        }
+        for key, value in defaults.items():
+            conn.execute("INSERT OR IGNORE INTO settings (key, value) VALUES (?, ?)", (key, value))
+        if DEFAULT_ENV_KEY:
+            conn.execute(
+                """
+                INSERT OR IGNORE INTO api_keys (name, api_key, enabled, created_at, updated_at)
+                VALUES ('env-default', ?, 1, ?, ?)
+                """,
+                (DEFAULT_ENV_KEY, now, now),
+            )
+        conn.commit()
+    finally:
+        conn.close()
+def get_setting(conn: sqlite3.Connection, key: str, default: str) -> str:
+    row = conn.execute("SELECT value FROM settings WHERE key = ?", (key,)).fetchone()
+    return row["value"] if row else default
+def set_setting(conn: sqlite3.Connection, key: str, value: str) -> None:
+    conn.execute(
+        """
+        INSERT INTO settings (key, value) VALUES (?, ?)
+        ON CONFLICT(key) DO UPDATE SET value = excluded.value
+        """,
+        (key, value),
+    )
+def get_settings_payload(conn: sqlite3.Connection) -> dict[str, Any]:
+    return {
+        "healthcheck_enabled": bool_value(get_setting(conn, "healthcheck_enabled", "true")),
+        "healthcheck_interval_minutes": int(get_setting(conn, "healthcheck_interval_minutes", str(DEFAULT_HEALTH_INTERVAL_MINUTES))),
+        "healthcheck_prompt": get_setting(conn, "healthcheck_prompt", DEFAULT_HEALTH_PROMPT),
+        "public_history_hours": int(get_setting(conn, "public_history_hours", str(PUBLIC_HISTORY_HOURS))),
+    }
+def mask_secret(secret: str) -> str:
+    if len(secret) <= 8:
+        return f"{secret[:2]}***"
+    return f"{secret[:4]}...{secret[-4:]}"
+def create_admin_token() -> str:
+    serializer = URLSafeTimedSerializer(SESSION_SECRET, salt="nim-admin-auth")
+    return serializer.dumps({"role": "admin"})
+def verify_admin_token(token: str) -> bool:
+    serializer = URLSafeTimedSerializer(SESSION_SECRET, salt="nim-admin-auth")
+    try:
+        payload = serializer.loads(token, max_age=60 * 60 * 24 * 7)
+    except (BadSignature, SignatureExpired):
+        return False
+    return payload.get("role") == "admin"
+def require_admin(request: Request, authorization: str | None = Header(default=None)) -> bool:
+    token: str | None = None
+    if authorization and authorization.startswith("Bearer "):
+        token = authorization.removeprefix("Bearer ").strip()
+    if not token:
+        token = request.cookies.get(COOKIE_NAME)
+    if not token or not verify_admin_token(token):
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Admin authentication required.")
+    return True
+def require_proxy_token_if_configured(authorization: str | None = Header(default=None)) -> bool:
+    if not GATEWAY_API_KEY:
+        return True
+    if not authorization or not authorization.startswith("Bearer "):
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Missing bearer token.")
+    token = authorization.removeprefix("Bearer ").strip()
+    if token != GATEWAY_API_KEY:
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid bearer token.")
+    return True
+def fetch_model_by_identifier(conn: sqlite3.Connection, identifier: str | int, enabled_only: bool = False) -> sqlite3.Row | None:
+    clause = "AND enabled = 1" if enabled_only else ""
+    if isinstance(identifier, int) or (isinstance(identifier, str) and identifier.isdigit()):
+        row = conn.execute(f"SELECT * FROM proxy_models WHERE id = ? {clause}", (int(identifier),)).fetchone()
+        if row:
+            return row
+    return conn.execute(f"SELECT * FROM proxy_models WHERE model_id = ? {clause}", (str(identifier),)).fetchone()
+def fetch_key_by_identifier(conn: sqlite3.Connection, identifier: str | int, enabled_only: bool = False) -> sqlite3.Row | None:
+    clause = "AND enabled = 1" if enabled_only else ""
+    if isinstance(identifier, int) or (isinstance(identifier, str) and str(identifier).isdigit()):
+        row = conn.execute(f"SELECT * FROM api_keys WHERE id = ? {clause}", (int(identifier),)).fetchone()
+        if row:
+            return row
+    return conn.execute(f"SELECT * FROM api_keys WHERE name = ? {clause}", (str(identifier),)).fetchone()
+def select_api_key(conn: sqlite3.Connection, explicit_id: int | None = None) -> sqlite3.Row:
+    if explicit_id is not None:
+        row = fetch_key_by_identifier(conn, explicit_id, enabled_only=True)
+        if row:
+            return row
+    row = conn.execute(
+        """
+        SELECT * FROM api_keys
+        WHERE enabled = 1
+        ORDER BY CASE WHEN last_used_at IS NULL THEN 0 ELSE 1 END, last_used_at ASC, id ASC
+        LIMIT 1
+        """
+    ).fetchone()
+    if not row:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="No enabled NVIDIA NIM API key is configured.")
+    return row
+def row_to_model_item(row: sqlite3.Row) -> dict[str, Any]:
+    status_name = "unknown"
+    if row["last_health_status"] is not None:
+        status_name = "healthy" if bool(row["last_health_status"]) else "down"
+    return {
+        "id": row["id"],
+        "model_id": row["model_id"],
+        "name": row["model_id"],
+        "display_name": row["display_name"],
+        "endpoint": "/v1/responses",
+        "provider": row["provider"],
+        "description": row["description"],
+        "enabled": bool(row["enabled"]),
+        "featured": bool(row["featured"]),
+        "sort_order": row["sort_order"],
+        "status": status_name,
+        "request_count": row["request_count"],
+        "success_count": row["success_count"],
+        "failure_count": row["failure_count"],
+        "healthcheck_count": row["healthcheck_count"],
+        "healthcheck_success_count": row["healthcheck_success_count"],
+        "last_used_at": row["last_used_at"],
+        "last_healthcheck_at": row["last_healthcheck_at"],
+        "last_health_status": None if row["last_health_status"] is None else bool(row["last_health_status"]),
+        "last_latency_ms": row["last_latency_ms"],
+        "created_at": row["created_at"],
+        "updated_at": row["updated_at"],
+    }
+def row_to_key_item(row: sqlite3.Row) -> dict[str, Any]:
+    total_checks = row["healthcheck_count"] or 0
+    ok_checks = row["healthcheck_success_count"] or 0
+    success_ratio = (ok_checks / total_checks) if total_checks else None
+    status_name = "healthy" if success_ratio and success_ratio >= 0.8 else "unknown"
+    return {
+        "id": row["id"],
+        "name": row["name"],
+        "label": row["name"],
+        "masked_key": mask_secret(row["api_key"]),
+        "enabled": bool(row["enabled"]),
+        "status": status_name,
+        "request_count": row["request_count"],
+        "success_count": row["success_count"],
+        "failure_count": row["failure_count"],
+        "healthcheck_count": row["healthcheck_count"],
+        "healthcheck_success_count": row["healthcheck_success_count"],
+        "last_used_at": row["last_used_at"],
+        "last_tested": row["last_tested_at"],
+        "last_tested_at": row["last_tested_at"],
+        "last_latency_ms": row["last_latency_ms"],
+        "created_at": row["created_at"],
+        "updated_at": row["updated_at"],
+    }
+def make_error(status_code: int, message: str, error_type: str = "invalid_request_error") -> JSONResponse:
+    return JSONResponse(
+        status_code=status_code,
+        content={"error": {"message": message, "type": error_type, "code": status_code}},
+    )
+def normalize_content(content: Any, role: str) -> list[dict[str, Any]]:
+    if content is None:
+        return []
+    if isinstance(content, str):
+        return [{"type": "output_text" if role == "assistant" else "input_text", "text": content}]
+    if isinstance(content, list):
+        normalized: list[dict[str, Any]] = []
+        for part in content:
+            if isinstance(part, str):
+                normalized.append({"type": "output_text" if role == "assistant" else "input_text", "text": part})
+                continue
+            if not isinstance(part, dict):
+                normalized.append({"type": "input_text", "text": str(part)})
+                continue
+            if part.get("type") in {"input_text", "output_text", "text", "tool_call", "function_call"}:
+                normalized.append(part)
+                continue
+            if "text" in part:
+                normalized.append({"type": part.get("type", "input_text"), "text": part.get("text", "")})
+        return normalized
+    if isinstance(content, dict):
+        if "text" in content:
+            return [{"type": content.get("type", "input_text"), "text": content.get("text", "")}]
+        return [{"type": "input_text", "text": json_dumps(content)}]
+    return [{"type": "input_text", "text": str(content)}]
+def normalize_input_items(value: Any) -> list[dict[str, Any]]:
+    if value is None:
+        return []
+    if isinstance(value, str):
+        return [{"type": "message", "role": "user", "content": [{"type": "input_text", "text": value}]}]
+    if isinstance(value, dict):
+        value = [value]
+    items: list[dict[str, Any]] = []
+    for item in value:
+        if isinstance(item, str):
+            items.append({"type": "message", "role": "user", "content": [{"type": "input_text", "text": item}]})
+            continue
+        if not isinstance(item, dict):
+            items.append({"type": "message", "role": "user", "content": [{"type": "input_text", "text": str(item)}]})
+            continue
+        item_type = item.get("type")
+        if item_type == "message" or item.get("role"):
+            role = item.get("role", "user")
+            items.append({"type": "message", "role": role, "content": normalize_content(item.get("content"), role)})
+            continue
+        if item_type == "function_call_output":
+            output = item.get("output")
+            if not isinstance(output, str):
+                output = json_dumps(output) if output is not None else ""
+            items.append({"type": "function_call_output", "call_id": item.get("call_id"), "output": output})
+            continue
+        if item_type == "function_call":
+            arguments = item.get("arguments", "{}")
+            if not isinstance(arguments, str):
+                arguments = json_dumps(arguments)
+            items.append(
+                {
+                    "type": "function_call",
+                    "call_id": item.get("call_id") or f"call_{uuid.uuid4().hex[:12]}",
+                    "name": item.get("name"),
+                    "arguments": arguments,
+                }
+            )
+            continue
+        if item_type in {"input_text", "output_text", "text"}:
+            items.append({"type": "message", "role": "user", "content": [{"type": "input_text", "text": item.get("text", "")}]})
+            continue
+        items.append({"type": "message", "role": "user", "content": [{"type": "input_text", "text": json_dumps(item)}]})
+    return items
+def extract_text_from_content(content: Any) -> str:
+    if content is None:
+        return ""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, dict):
+        if "text" in content:
+            return str(content.get("text", ""))
+        return json_dumps(content)
+    if isinstance(content, list):
+        chunks: list[str] = []
+        for part in content:
+            if isinstance(part, str):
+                chunks.append(part)
+            elif isinstance(part, dict) and part.get("type") in {"input_text", "output_text", "text"}:
+                chunks.append(str(part.get("text", "")))
+        return "\n".join(filter(None, chunks))
+    return str(content)
+def load_previous_conversation_items(conn: sqlite3.Connection, previous_response_id: str | None) -> list[dict[str, Any]]:
+    if not previous_response_id:
+        return []
+    records: list[sqlite3.Row] = []
+    current = previous_response_id
+    while current:
+        row = conn.execute("SELECT * FROM response_records WHERE response_id = ?", (current,)).fetchone()
+        if not row:
+            raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=f"previous_response_id '{current}' was not found.")
+        records.append(row)
+        current = row["parent_response_id"]
+    items: list[dict[str, Any]] = []
+    for row in reversed(records):
+        items.extend(json.loads(row["input_items_json"]))
+        items.extend(json.loads(row["output_items_json"]))
+    return items
+def items_to_chat_messages(items: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    messages: list[dict[str, Any]] = []
+    pending_tool_calls: list[dict[str, Any]] = []
+    def flush_pending_tool_calls() -> None:
+        nonlocal pending_tool_calls
+        if pending_tool_calls:
+            messages.append({"role": "assistant", "content": "", "tool_calls": pending_tool_calls})
+            pending_tool_calls = []
+    for item in items:
+        item_type = item.get("type")
+        if item_type == "function_call":
+            pending_tool_calls.append(
+                {
+                    "id": item.get("call_id") or f"call_{uuid.uuid4().hex[:12]}",
+                    "type": "function",
+                    "function": {"name": item.get("name"), "arguments": item.get("arguments", "{}")},
+                }
+            )
+            continue
+        if item_type == "function_call_output":
+            flush_pending_tool_calls()
+            messages.append({"role": "tool", "tool_call_id": item.get("call_id"), "content": item.get("output", "")})
+            continue
+        if item_type != "message":
+            continue
+        flush_pending_tool_calls()
+        role = item.get("role", "user")
+        text_value = extract_text_from_content(item.get("content"))
+        if role in {"system", "developer"}:
+            messages.append({"role": "system", "content": text_value})
+        elif role == "assistant":
+            messages.append({"role": "assistant", "content": text_value})
+        else:
+            messages.append({"role": role, "content": text_value})
+    flush_pending_tool_calls()
+    return [message for message in messages if message.get("content") is not None or message.get("tool_calls")]
+def response_tools_to_chat_tools(tools: Any) -> list[dict[str, Any]]:
+    normalized: list[dict[str, Any]] = []
+    for tool in tools or []:
+        if not isinstance(tool, dict) or tool.get("type") != "function":
+            continue
+        function_payload = tool.get("function") if isinstance(tool.get("function"), dict) else tool
+        name = function_payload.get("name")
+        if not name:
+            continue
+        normalized.append(
+            {
+                "type": "function",
+                "function": {
+                    "name": name,
+                    "description": function_payload.get("description"),
+                    "parameters": function_payload.get("parameters") or {"type": "object", "properties": {}},
+                },
+            }
+        )
+    return normalized
+def normalize_tool_choice(tool_choice: Any, tools: list[dict[str, Any]]) -> tuple[Any, list[dict[str, Any]]]:
+    if tool_choice is None:
+        return None, tools
+    if isinstance(tool_choice, str):
+        return tool_choice, tools
+    if not isinstance(tool_choice, dict):
+        return None, tools
+    if tool_choice.get("type") == "function":
+        function_name = tool_choice.get("name") or (tool_choice.get("function") or {}).get("name")
+        if function_name:
+            return {"type": "function", "function": {"name": function_name}}, tools
+    if tool_choice.get("type") == "allowed_tools":
+        allowed = tool_choice.get("tools") or []
+        allowed_names = {
+            entry if isinstance(entry, str) else entry.get("name")
+            for entry in allowed
+            if entry is not None
+        }
+        filtered_tools = [tool for tool in tools if tool["function"]["name"] in allowed_names]
+        mode = tool_choice.get("mode", "auto")
+        return mode if isinstance(mode, str) else "auto", filtered_tools
+    return None, tools
+def build_chat_payload(body: dict[str, Any], items: list[dict[str, Any]]) -> dict[str, Any]:
+    tools = response_tools_to_chat_tools(body.get("tools"))
+    tool_choice, tools = normalize_tool_choice(body.get("tool_choice"), tools)
+    payload: dict[str, Any] = {"model": body.get("model"), "messages": items_to_chat_messages(items)}
+    if tools:
+        payload["tools"] = tools
+    if tool_choice is not None:
+        payload["tool_choice"] = tool_choice
+    if body.get("temperature") is not None:
+        payload["temperature"] = body.get("temperature")
+    if body.get("top_p") is not None:
+        payload["top_p"] = body.get("top_p")
+    if body.get("parallel_tool_calls") is not None:
+        payload["parallel_tool_calls"] = body.get("parallel_tool_calls")
+    if body.get("max_output_tokens") is not None:
+        payload["max_tokens"] = body.get("max_output_tokens")
+    if body.get("instructions"):
+        payload["messages"] = [{"role": "system", "content": body["instructions"]}] + payload["messages"]
+    text_config = body.get("text") or {}
+    text_format = text_config.get("format") if isinstance(text_config, dict) else None
+    if isinstance(text_format, dict):
+        if text_format.get("type") == "json_object":
+            payload["response_format"] = {"type": "json_object"}
+        elif text_format.get("type") == "json_schema":
+            payload["response_format"] = {"type": "json_schema", "json_schema": text_format.get("json_schema") or {}}
+    return payload
+def extract_upstream_message(upstream_json: dict[str, Any]) -> tuple[dict[str, Any], str | None]:
+    choices = upstream_json.get("choices") or []
+    if not choices:
+        return {}, None
+    choice = choices[0] or {}
+    return choice.get("message") or {}, choice.get("finish_reason")
+def extract_text_and_tool_calls(message: dict[str, Any]) -> tuple[str, list[dict[str, Any]]]:
+    content = message.get("content")
+    text_chunks: list[str] = []
+    tool_calls: list[dict[str, Any]] = []
+    if isinstance(content, str):
+        text_chunks.append(content)
+    elif isinstance(content, list):
+        for part in content:
+            if isinstance(part, str):
+                text_chunks.append(part)
+                continue
+            if not isinstance(part, dict):
+                text_chunks.append(str(part))
+                continue
+            if part.get("type") in {"input_text", "output_text", "text"}:
+                text_chunks.append(str(part.get("text", "")))
+                continue
+            if part.get("type") in {"tool_call", "function_call"}:
+                arguments = part.get("arguments") or "{}"
+                if not isinstance(arguments, str):
+                    arguments = json_dumps(arguments)
+                tool_calls.append({"id": part.get("id") or part.get("call_id") or f"call_{uuid.uuid4().hex[:12]}", "name": part.get("name"), "arguments": arguments})
+    for tool_call in message.get("tool_calls") or []:
+        if not isinstance(tool_call, dict):
+            continue
+        function_data = tool_call.get("function") or {}
+        arguments = function_data.get("arguments") or tool_call.get("arguments") or "{}"
+        if not isinstance(arguments, str):
+            arguments = json_dumps(arguments)
+        tool_calls.append({"id": tool_call.get("id") or f"call_{uuid.uuid4().hex[:12]}", "name": function_data.get("name") or tool_call.get("name"), "arguments": arguments})
+    deduped: list[dict[str, Any]] = []
+    seen_ids: set[str] = set()
+    for tool_call in tool_calls:
+        if tool_call["id"] in seen_ids:
+            continue
+        seen_ids.add(tool_call["id"])
+        deduped.append(tool_call)
+    return "\n".join(filter(None, text_chunks)).strip(), deduped
+def build_choice_alias(output_items: list[dict[str, Any]], finish_reason: str | None) -> list[dict[str, Any]]:
+    content_parts: list[dict[str, Any]] = []
+    for item in output_items:
+        if item.get("type") == "message":
+            for part in item.get("content", []):
+                content_parts.append({"type": part.get("type", "output_text"), "text": part.get("text", "")})
+        elif item.get("type") == "function_call":
+            arguments = item.get("arguments") or "{}"
+            try:
+                parsed_arguments = json.loads(arguments)
+            except Exception:
+                parsed_arguments = arguments
+            content_parts.append({"type": "tool_call", "id": item.get("call_id"), "name": item.get("name"), "arguments": parsed_arguments})
+    return [{"index": 0, "message": {"role": "assistant", "content": content_parts}, "finish_reason": finish_reason or "stop"}]
+def chat_completion_to_response(body: dict[str, Any], upstream_json: dict[str, Any], previous_response_id: str | None) -> dict[str, Any]:
+    upstream_message, finish_reason = extract_upstream_message(upstream_json)
+    assistant_text, tool_calls = extract_text_and_tool_calls(upstream_message)
+    response_id = upstream_json.get("id") or f"resp_{uuid.uuid4().hex}"
+    output_items: list[dict[str, Any]] = []
+    if assistant_text:
+        output_items.append({"id": f"msg_{uuid.uuid4().hex[:24]}", "type": "message", "status": "completed", "role": "assistant", "content": [{"type": "output_text", "text": assistant_text, "annotations": []}]})
+    for tool_call in tool_calls:
+        output_items.append({"id": f"fc_{uuid.uuid4().hex[:24]}", "type": "function_call", "status": "completed", "call_id": tool_call["id"], "name": tool_call.get("name"), "arguments": tool_call.get("arguments", "{}")})
+    usage = upstream_json.get("usage") or {}
+    return {
+        "id": response_id,
+        "object": "response",
+        "created_at": int(time.time()),
+        "status": "completed",
+        "model": body.get("model"),
+        "output": output_items,
+        "output_text": assistant_text,
+        "parallel_tool_calls": bool(body.get("parallel_tool_calls", True)),
+        "previous_response_id": previous_response_id,
+        "store": True,
+        "text": body.get("text") or {"format": {"type": "text"}},
+        "usage": {"input_tokens": usage.get("prompt_tokens"), "output_tokens": usage.get("completion_tokens"), "total_tokens": usage.get("total_tokens")},
+        "choices": build_choice_alias(output_items, finish_reason),
+        "upstream": {"id": upstream_json.get("id"), "object": upstream_json.get("object", "chat.completion"), "finish_reason": finish_reason or "stop"},
+    }
+def store_response_record(conn: sqlite3.Connection, response_payload: dict[str, Any], request_body: dict[str, Any], input_items: list[dict[str, Any]], model_row: sqlite3.Row, api_key_row: sqlite3.Row) -> None:
+    conn.execute(
+        """
+        INSERT OR REPLACE INTO response_records (
+            response_id, parent_response_id, model_id, api_key_id, request_json,
+            input_items_json, output_json, output_items_json, status, created_at
+        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+        """,
+        (
+            response_payload["id"],
+            request_body.get("previous_response_id"),
+            model_row["id"],
+            api_key_row["id"],
+            json_dumps(request_body),
+            json_dumps(input_items),
+            json_dumps(response_payload),
+            json_dumps(response_payload.get("output") or []),
+            response_payload.get("status", "completed"),
+            utcnow_iso(),
+        ),
+    )
+def update_usage_stats(conn: sqlite3.Connection, model_row: sqlite3.Row, api_key_row: sqlite3.Row, *, ok: bool, latency_ms: float | None, is_healthcheck: bool) -> None:
+    now = utcnow_iso()
+    if is_healthcheck:
+        conn.execute(
+            """
+            UPDATE proxy_models
+            SET healthcheck_count = healthcheck_count + 1,
+                healthcheck_success_count = healthcheck_success_count + ?,
+                last_healthcheck_at = ?,
+                last_health_status = ?,
+                last_latency_ms = ?,
+                updated_at = ?
+            WHERE id = ?
+            """,
+            (1 if ok else 0, now, 1 if ok else 0, latency_ms, now, model_row["id"]),
+        )
+        conn.execute(
+            """
+            UPDATE api_keys
+            SET healthcheck_count = healthcheck_count + 1,
+                healthcheck_success_count = healthcheck_success_count + ?,
+                last_tested_at = ?,
+                last_latency_ms = ?,
+                updated_at = ?
+            WHERE id = ?
+            """,
+            (1 if ok else 0, now, latency_ms, now, api_key_row["id"]),
+        )
+        return
+    conn.execute(
+        """
+        UPDATE proxy_models
+        SET request_count = request_count + 1,
+            success_count = success_count + ?,
+            failure_count = failure_count + ?,
+            last_used_at = ?,
+            last_latency_ms = ?,
+            updated_at = ?
+        WHERE id = ?
+        """,
+        (1 if ok else 0, 0 if ok else 1, now, latency_ms, now, model_row["id"]),
+    )
+    conn.execute(
+        """
+        UPDATE api_keys
+        SET request_count = request_count + 1,
+            success_count = success_count + ?,
+            failure_count = failure_count + ?,
+            last_used_at = ?,
+            last_latency_ms = ?,
+            updated_at = ?
+        WHERE id = ?
+        """,
+        (1 if ok else 0, 0 if ok else 1, now, latency_ms, now, api_key_row["id"]),
+    )
+def insert_health_record(conn: sqlite3.Connection, model_row: sqlite3.Row, api_key_row: sqlite3.Row, *, ok: bool, status_code: int | None, latency_ms: float | None, error_message: str | None, response_excerpt: str | None) -> None:
+    conn.execute(
+        """
+        INSERT INTO health_check_records (
+            model_id, api_key_id, ok, status_code, latency_ms, error_message, response_excerpt, checked_at
+        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+        """,
+        (model_row["id"], api_key_row["id"], 1 if ok else 0, status_code, latency_ms, error_message, response_excerpt, utcnow_iso()),
+    )
+async def post_nvidia_chat_completion(api_key: str, payload: dict[str, Any]) -> tuple[dict[str, Any], float]:
+    started = time.perf_counter()
+    async with httpx.AsyncClient(timeout=REQUEST_TIMEOUT_SECONDS) as client:
+        response = await client.post(CHAT_COMPLETIONS_URL, headers={"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}, json=payload)
+    latency_ms = round((time.perf_counter() - started) * 1000, 2)
+    if response.status_code >= 400:
+        try:
+            error_payload = response.json()
+            detail = error_payload.get("error", {}).get("message") or json_dumps(error_payload)
+        except Exception:
+            detail = response.text
+        raise HTTPException(status_code=response.status_code, detail=f"NVIDIA NIM request failed: {detail}")
+    return response.json(), latency_ms
+async def perform_healthcheck(conn: sqlite3.Connection, model_row: sqlite3.Row, api_key_row: sqlite3.Row, prompt: str) -> dict[str, Any]:
+    payload = {"model": model_row["model_id"], "messages": [{"role": "user", "content": prompt}], "max_tokens": 32, "temperature": 0}
+    try:
+        upstream_json, latency_ms = await post_nvidia_chat_completion(api_key_row["api_key"], payload)
+        message, _finish_reason = extract_upstream_message(upstream_json)
+        assistant_text, _tool_calls = extract_text_and_tool_calls(message)
+        ok = True
+        detail = assistant_text or "Model responded successfully."
+        status_code = 200
+        error_message = None
+        response_excerpt = detail[:200]
+    except HTTPException as exc:
+        ok = False
+        latency_ms = None
+        detail = exc.detail
+        status_code = exc.status_code
+        error_message = exc.detail
+        response_excerpt = None
+    update_usage_stats(conn, model_row, api_key_row, ok=ok, latency_ms=latency_ms, is_healthcheck=True)
+    insert_health_record(conn, model_row, api_key_row, ok=ok, status_code=status_code, latency_ms=latency_ms, error_message=error_message, response_excerpt=response_excerpt)
+    conn.commit()
+    return {"model": model_row["model_id"], "display_name": model_row["display_name"], "api_key": api_key_row["name"], "status": "healthy" if ok else "down", "ok": ok, "latency": latency_ms, "status_code": status_code, "detail": detail, "checked_at": utcnow_iso()}
+async def run_healthchecks(model_identifier: str | int | None = None, api_key_identifier: str | int | None = None, prompt: str | None = None) -> list[dict[str, Any]]:
+    conn = get_db_connection()
+    try:
+        settings_payload = get_settings_payload(conn)
+        effective_prompt = prompt or settings_payload["healthcheck_prompt"]
+        if api_key_identifier is not None:
+            api_key_row = fetch_key_by_identifier(conn, api_key_identifier, enabled_only=True)
+            if not api_key_row:
+                raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="API key not found.")
+            key_rows = [api_key_row]
+        else:
+            key_rows = conn.execute("SELECT * FROM api_keys WHERE enabled = 1 ORDER BY id ASC").fetchall()
+        if not key_rows:
+            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="No enabled NVIDIA NIM API keys are configured.")
+        if model_identifier is not None:
+            model_row = fetch_model_by_identifier(conn, model_identifier, enabled_only=True)
+            if not model_row:
+                raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Model not found.")
+            model_rows = [model_row]
+        else:
+            model_rows = conn.execute("SELECT * FROM proxy_models WHERE enabled = 1 ORDER BY sort_order ASC, model_id ASC").fetchall()
+        results: list[dict[str, Any]] = []
+        for index, model_row in enumerate(model_rows):
+            api_key_row = key_rows[index % len(key_rows)]
+            results.append(await perform_healthcheck(conn, model_row, api_key_row, effective_prompt))
+        return results
+    finally:
+        conn.close()
+def build_public_health_payload(hours: int | None = None) -> dict[str, Any]:
+    conn = get_db_connection()
+    try:
+        settings_payload = get_settings_payload(conn)
+        effective_hours = hours or settings_payload["public_history_hours"]
+        since = utcnow() - timedelta(hours=effective_hours)
+        models = conn.execute("SELECT * FROM proxy_models WHERE enabled = 1 ORDER BY sort_order ASC, model_id ASC").fetchall()
+        result_models: list[dict[str, Any]] = []
+        last_updated: str | None = None
+        for model in models:
+            rows = conn.execute("SELECT * FROM health_check_records WHERE model_id = ? AND checked_at >= ? ORDER BY checked_at ASC", (model["id"], since.isoformat())).fetchall()
+            hourly = []
+            ok_count = 0
+            for row in rows:
+                status_name = "healthy" if row["ok"] else "down"
+                hourly.append({"time": row["checked_at"], "status": status_name, "latency": row["latency_ms"]})
+                ok_count += 1 if row["ok"] else 0
+                last_updated = row["checked_at"]
+            total = len(rows)
+            success_rate = round((ok_count / total) * 100, 1) if total else 0.0
+            model_status = "unknown" if model["last_health_status"] is None else ("healthy" if model["last_health_status"] else "down")
+            result_models.append({"id": model["id"], "model_id": model["model_id"], "name": model["display_name"], "display_name": model["display_name"], "endpoint": "/v1/responses", "status": model_status, "beat": f"{success_rate}%", "hourly": hourly, "last_health_status": None if model["last_health_status"] is None else bool(model["last_health_status"]), "last_healthcheck_at": model["last_healthcheck_at"], "success_rate": success_rate, "points": [{"hour": entry["time"], "label": parse_datetime(entry["time"]).strftime("%H:%M") if parse_datetime(entry["time"]) else entry["time"], "ok": entry["status"] == "healthy", "latency_ms": entry["latency"]} for entry in hourly]})
+        return {"generated_at": utcnow_iso(), "last_updated": last_updated, "hours": effective_hours, "models": result_models}
+    finally:
+        conn.close()
+def schedule_healthchecks() -> None:
+    conn = get_db_connection()
+    try:
+        settings_payload = get_settings_payload(conn)
+    finally:
+        conn.close()
+    interval = max(5, int(settings_payload["healthcheck_interval_minutes"]))
+    enabled = bool(settings_payload["healthcheck_enabled"])
+    if scheduler.get_job("nim-hourly-healthcheck"):
+        scheduler.remove_job("nim-hourly-healthcheck")
+    if enabled:
+        scheduler.add_job(run_healthchecks, "interval", minutes=interval, id="nim-hourly-healthcheck", replace_existing=True, next_run_time=utcnow() + timedelta(seconds=10))
+init_db()
+@asynccontextmanager
+async def lifespan(_app: FastAPI):
+    init_db()
+    if not scheduler.running:
+        scheduler.start()
+    schedule_healthchecks()
+    try:
+        yield
+    finally:
+        if scheduler.running:
+            scheduler.shutdown(wait=False)
+app = FastAPI(title="NIM Responses Gateway", lifespan=lifespan)
+app.mount("/static", StaticFiles(directory=str(STATIC_DIR)), name="static")
+@app.get("/")
+async def public_dashboard() -> FileResponse:
+    return FileResponse(STATIC_DIR / "index.html")
+@app.get("/admin")
+async def admin_dashboard() -> FileResponse:
+    return FileResponse(STATIC_DIR / "admin.html")
+@app.get("/api/health/public")
+async def public_health(hours: int | None = None) -> dict[str, Any]:
+    return build_public_health_payload(hours)
+@app.get("/v1/models")
+async def list_models(_: bool = Depends(require_proxy_token_if_configured)) -> dict[str, Any]:
+    conn = get_db_connection()
+    try:
+        rows = conn.execute("SELECT * FROM proxy_models WHERE enabled = 1 ORDER BY sort_order ASC, model_id ASC").fetchall()
+        data = [{"id": row["model_id"], "object": "model", "created": 0, "owned_by": "nvidia-nim", "display_name": row["display_name"], "status": ("unknown" if row["last_health_status"] is None else ("healthy" if row["last_health_status"] else "down"))} for row in rows]
+        return {"object": "list", "data": data, "models": data}
+    finally:
+        conn.close()
+@app.get("/v1/responses/{response_id}")
+async def get_response(response_id: str, _: bool = Depends(require_proxy_token_if_configured)):
+    conn = get_db_connection()
+    try:
+        row = conn.execute("SELECT output_json FROM response_records WHERE response_id = ?", (response_id,)).fetchone()
+        if not row:
+            raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Response not found.")
+        return json.loads(row["output_json"])
+    finally:
+        conn.close()
+@app.post("/v1/responses")
+async def create_response(request: Request, _: bool = Depends(require_proxy_token_if_configured)):
+    body = await request.json()
+    if not isinstance(body, dict):
+        return make_error(status.HTTP_400_BAD_REQUEST, "Request body must be a JSON object.")
+    if not body.get("model"):
+        return make_error(status.HTTP_400_BAD_REQUEST, "The 'model' field is required.")
+    if body.get("input") is None:
+        return make_error(status.HTTP_400_BAD_REQUEST, "The 'input' field is required.")
+    conn = get_db_connection()
+    try:
+        model_row = fetch_model_by_identifier(conn, body["model"], enabled_only=True)
+        if not model_row:
+            return make_error(status.HTTP_404_NOT_FOUND, f"Model '{body['model']}' is not configured or is disabled.")
+        api_key_row = select_api_key(conn)
+        previous_items = load_previous_conversation_items(conn, body.get("previous_response_id"))
+        input_items = normalize_input_items(body.get("input"))
+        merged_items = previous_items + input_items
+        chat_payload = build_chat_payload(body, merged_items)
+        try:
+            upstream_json, latency_ms = await post_nvidia_chat_completion(api_key_row["api_key"], chat_payload)
+        except HTTPException as exc:
+            update_usage_stats(conn, model_row, api_key_row, ok=False, latency_ms=None, is_healthcheck=False)
+            conn.commit()
+            raise exc
+        response_payload = chat_completion_to_response(body, upstream_json, body.get("previous_response_id"))
+        update_usage_stats(conn, model_row, api_key_row, ok=True, latency_ms=latency_ms, is_healthcheck=False)
+        store_response_record(conn, response_payload, body, input_items, model_row, api_key_row)
+        conn.commit()
+        if body.get("stream"):
+            async def event_stream() -> Any:
+                yield f"event: response.created\ndata: {json_dumps({'type': 'response.created', 'response': {'id': response_payload['id'], 'model': response_payload['model'], 'status': 'in_progress'}})}\n\n"
+                for index, item in enumerate(response_payload.get("output") or []):
+                    yield f"event: response.output_item.added\ndata: {json_dumps({'type': 'response.output_item.added', 'output_index': index, 'item': item})}\n\n"
+                    if item.get("type") == "message":
+                        text_value = extract_text_from_content(item.get("content"))
+                        if text_value:
+                            yield f"event: response.output_text.delta\ndata: {json_dumps({'type': 'response.output_text.delta', 'output_index': index, 'delta': text_value})}\n\n"
+                            yield f"event: response.output_text.done\ndata: {json_dumps({'type': 'response.output_text.done', 'output_index': index, 'text': text_value})}\n\n"
+                    if item.get("type") == "function_call":
+                        yield f"event: response.function_call_arguments.done\ndata: {json_dumps({'type': 'response.function_call_arguments.done', 'output_index': index, 'arguments': item.get('arguments', '{}'), 'call_id': item.get('call_id')})}\n\n"
+                    yield f"event: response.output_item.done\ndata: {json_dumps({'type': 'response.output_item.done', 'output_index': index, 'item': item})}\n\n"
+                yield f"event: response.completed\ndata: {json_dumps({'type': 'response.completed', 'response': response_payload})}\n\n"
+            return StreamingResponse(event_stream(), media_type="text/event-stream")
+        return response_payload
+    finally:
+        conn.close()
+@app.post("/admin/api/login")
+async def admin_login(request: Request, response: Response):
+    if not ADMIN_PASSWORD:
+        raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail="PASSWORD is not configured.")
+    body = await request.json()
+    password = body.get("password") if isinstance(body, dict) else None
+    if password != ADMIN_PASSWORD:
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid password.")
+    token = create_admin_token()
+    response.set_cookie(COOKIE_NAME, token, httponly=True, samesite="lax", secure=False, max_age=60 * 60 * 24 * 7)
+    return {"token": token, "access_token": token, "token_type": "bearer"}
+@app.post("/admin/api/logout")
+async def admin_logout(response: Response, _: bool = Depends(require_admin)):
+    response.delete_cookie(COOKIE_NAME)
+    return {"message": "Logged out."}
+@app.get("/admin/api/session")
+async def admin_session(_: bool = Depends(require_admin)):
+    return {"ok": True}
+@app.get("/admin/api/overview")
+async def admin_overview(_: bool = Depends(require_admin)):
+    conn = get_db_connection()
+    try:
+        total_models = conn.execute("SELECT COUNT(*) AS count FROM proxy_models").fetchone()["count"]
+        enabled_models = conn.execute("SELECT COUNT(*) AS count FROM proxy_models WHERE enabled = 1").fetchone()["count"]
+        total_keys = conn.execute("SELECT COUNT(*) AS count FROM api_keys").fetchone()["count"]
+        enabled_keys = conn.execute("SELECT COUNT(*) AS count FROM api_keys WHERE enabled = 1").fetchone()["count"]
+        usage = conn.execute("SELECT COALESCE(SUM(request_count), 0) AS total_requests, COALESCE(SUM(success_count), 0) AS total_success, COALESCE(SUM(failure_count), 0) AS total_failures FROM proxy_models").fetchone()
+        recent_rows = conn.execute("SELECT h.checked_at, h.ok, h.latency_ms, m.model_id FROM health_check_records h JOIN proxy_models m ON m.id = h.model_id ORDER BY h.checked_at DESC LIMIT 8").fetchall()
+        return {
+            "metrics": [
+                {"label": "Enabled Models", "value": enabled_models},
+                {"label": "Enabled Keys", "value": enabled_keys},
+                {"label": "Proxy Requests", "value": usage["total_requests"]},
+                {"label": "Failures", "value": usage["total_failures"]},
+            ],
+            "recent_checks": [{"time": row["checked_at"], "model": row["model_id"], "status": "healthy" if row["ok"] else "down", "latency": row["latency_ms"]} for row in recent_rows],
+            "totals": {
+                "total_models": total_models,
+                "enabled_models": enabled_models,
+                "total_keys": total_keys,
+                "enabled_keys": enabled_keys,
+                "total_requests": usage["total_requests"],
+                "total_success": usage["total_success"],
+                "total_failures": usage["total_failures"],
+            },
+        }
+    finally:
+        conn.close()
+@app.get("/admin/api/models")
+async def admin_models(_: bool = Depends(require_admin)):
+    conn = get_db_connection()
+    try:
+        rows = conn.execute("SELECT * FROM proxy_models ORDER BY sort_order ASC, model_id ASC").fetchall()
+        return {"items": [row_to_model_item(row) for row in rows]}
+    finally:
+        conn.close()
+@app.get("/admin/api/models/usage")
+async def admin_models_usage(_: bool = Depends(require_admin)):
+    conn = get_db_connection()
+    try:
+        rows = conn.execute("SELECT * FROM proxy_models ORDER BY request_count DESC, model_id ASC").fetchall()
+        return {"items": [row_to_model_item(row) for row in rows]}
+    finally:
+        conn.close()
+@app.post("/admin/api/models")
+async def admin_add_model(request: Request, _: bool = Depends(require_admin)):
+    body = await request.json()
+    model_id = (body.get("model_id") or body.get("name") or "").strip()
+    display_name = (body.get("display_name") or model_id).strip()
+    if not model_id:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="model_id is required.")
+    conn = get_db_connection()
+    try:
+        now = utcnow_iso()
+        conn.execute(
+            """
+            INSERT INTO proxy_models (model_id, display_name, provider, description, enabled, featured, sort_order, created_at, updated_at)
+            VALUES (?, ?, 'nvidia-nim', ?, ?, ?, ?, ?, ?)
+            ON CONFLICT(model_id) DO UPDATE SET
+                display_name = excluded.display_name,
+                description = excluded.description,
+                enabled = excluded.enabled,
+                featured = excluded.featured,
+                sort_order = excluded.sort_order,
+                updated_at = excluded.updated_at
+            """,
+            (model_id, display_name, body.get("description"), 1 if body.get("enabled", True) else 0, 1 if body.get("featured", False) else 0, int(body.get("sort_order", 0)), now, now),
+        )
+        conn.commit()
+        row = fetch_model_by_identifier(conn, model_id)
+        return {"item": row_to_model_item(row)}
+    finally:
+        conn.close()
+def delete_model_internal(model_identifier: str) -> dict[str, Any]:
+    conn = get_db_connection()
+    try:
+        row = fetch_model_by_identifier(conn, model_identifier)
+        if not row:
+            raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Model not found.")
+        conn.execute("DELETE FROM proxy_models WHERE id = ?", (row["id"],))
+        conn.commit()
+        return {"message": "Model deleted."}
+    finally:
+        conn.close()
+@app.delete("/admin/api/models/{model_identifier}")
+async def admin_delete_model(model_identifier: str, _: bool = Depends(require_admin)):
+    return delete_model_internal(model_identifier)
+@app.post("/admin/api/models/remove")
+async def admin_remove_model_alias(request: Request, _: bool = Depends(require_admin)):
+    body = await request.json()
+    value = body.get("value") if isinstance(body, dict) else None
+    if not value:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="value is required.")
+    return delete_model_internal(str(value))
+async def test_model_internal(model_identifier: str, payload: dict[str, Any] | None = None) -> dict[str, Any]:
+    conn = get_db_connection()
+    try:
+        row = fetch_model_by_identifier(conn, model_identifier, enabled_only=True)
+        if not row:
+            raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Model not found.")
+        api_key_row = select_api_key(conn, payload.get("api_key_id") if payload else None)
+        return await perform_healthcheck(conn, row, api_key_row, (payload or {}).get("prompt") or DEFAULT_HEALTH_PROMPT)
+    finally:
+        conn.close()
+@app.post("/admin/api/models/test")
+async def admin_test_model_alias(request: Request, _: bool = Depends(require_admin)):
+    body = await request.json()
+    identifier = body.get("value") or body.get("model_id")
+    if not identifier:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="value is required.")
+    return await test_model_internal(str(identifier), body)
+@app.post("/admin/api/models/{model_identifier}/test")
+async def admin_test_model(model_identifier: str, request: Request, _: bool = Depends(require_admin)):
+    body = await request.json() if request.method == "POST" else {}
+    return await test_model_internal(model_identifier, body)
+@app.get("/admin/api/keys")
+async def admin_keys(_: bool = Depends(require_admin)):
+    conn = get_db_connection()
+    try:
+        rows = conn.execute("SELECT * FROM api_keys ORDER BY id ASC").fetchall()
+        return {"items": [row_to_key_item(row) for row in rows]}
+    finally:
+        conn.close()
+@app.get("/admin/api/keys/usage")
+async def admin_keys_usage(_: bool = Depends(require_admin)):
+    conn = get_db_connection()
+    try:
+        rows = conn.execute("SELECT * FROM api_keys ORDER BY request_count DESC, id ASC").fetchall()
+        return {"items": [row_to_key_item(row) for row in rows]}
+    finally:
+        conn.close()
+@app.post("/admin/api/keys")
+async def admin_add_key(request: Request, _: bool = Depends(require_admin)):
+    body = await request.json()
+    name = (body.get("name") or body.get("label") or "").strip()
+    api_key = (body.get("api_key") or body.get("key") or "").strip()
+    if not name or not api_key:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Both name and api_key are required.")
+    conn = get_db_connection()
+    try:
+        now = utcnow_iso()
+        conn.execute(
+            """
+            INSERT INTO api_keys (name, api_key, enabled, created_at, updated_at)
+            VALUES (?, ?, ?, ?, ?)
+            ON CONFLICT(name) DO UPDATE SET api_key = excluded.api_key, enabled = excluded.enabled, updated_at = excluded.updated_at
+            """,
+            (name, api_key, 1 if body.get("enabled", True) else 0, now, now),
+        )
+        conn.commit()
+        row = fetch_key_by_identifier(conn, name)
+        return {"item": row_to_key_item(row)}
+    finally:
+        conn.close()
+def delete_key_internal(key_identifier: str) -> dict[str, Any]:
+    conn = get_db_connection()
+    try:
+        row = fetch_key_by_identifier(conn, key_identifier)
+        if not row:
+            raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="API key not found.")
+        conn.execute("DELETE FROM api_keys WHERE id = ?", (row["id"],))
+        conn.commit()
+        return {"message": "API key deleted."}
+    finally:
+        conn.close()
+@app.delete("/admin/api/keys/{key_identifier}")
+async def admin_delete_key(key_identifier: str, _: bool = Depends(require_admin)):
+    return delete_key_internal(key_identifier)
+@app.post("/admin/api/keys/remove")
+async def admin_remove_key_alias(request: Request, _: bool = Depends(require_admin)):
+    body = await request.json()
+    value = body.get("value") if isinstance(body, dict) else None
+    if not value:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="value is required.")
+    return delete_key_internal(str(value))
+async def test_key_internal(key_identifier: str, payload: dict[str, Any] | None = None) -> dict[str, Any]:
+    conn = get_db_connection()
+    try:
+        key_row = fetch_key_by_identifier(conn, key_identifier, enabled_only=True)
+        if not key_row:
+            raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="API key not found.")
+        model_identifier = (payload or {}).get("model_id") or DEFAULT_MODELS[0][0]
+        model_row = fetch_model_by_identifier(conn, model_identifier, enabled_only=True)
+        if not model_row:
+            raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Model not found.")
+        return await perform_healthcheck(conn, model_row, key_row, (payload or {}).get("prompt") or DEFAULT_HEALTH_PROMPT)
+    finally:
+        conn.close()
+@app.post("/admin/api/keys/test")
+async def admin_test_key_alias(request: Request, _: bool = Depends(require_admin)):
+    body = await request.json()
+    identifier = body.get("value") or body.get("name") or body.get("label")
+    if not identifier:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="value is required.")
+    return await test_key_internal(str(identifier), body)
+@app.post("/admin/api/keys/{key_identifier}/test")
+async def admin_test_key(key_identifier: str, request: Request, _: bool = Depends(require_admin)):
+    body = await request.json() if request.method == "POST" else {}
+    return await test_key_internal(key_identifier, body)
+@app.get("/admin/api/healthchecks")
+async def admin_healthchecks(hours: int = 48, _: bool = Depends(require_admin)):
+    conn = get_db_connection()
+    try:
+        since = utcnow() - timedelta(hours=hours)
+        rows = conn.execute(
+            """
+            SELECT h.*, m.model_id, m.display_name, k.name AS key_name
+            FROM health_check_records h
+            JOIN proxy_models m ON m.id = h.model_id
+            LEFT JOIN api_keys k ON k.id = h.api_key_id
+            WHERE h.checked_at >= ?
+            ORDER BY h.checked_at DESC
+            LIMIT 200
+            """,
+            (since.isoformat(),),
+        ).fetchall()
+        items = [{"id": row["id"], "model": row["display_name"], "model_id": row["model_id"], "api_key": row["key_name"], "status": "healthy" if row["ok"] else "down", "detail": row["response_excerpt"] or row["error_message"] or "No details available.", "latency": row["latency_ms"], "status_code": row["status_code"], "checked_at": row["checked_at"]} for row in rows]
+        return {"items": items}
+    finally:
+        conn.close()
+@app.post("/admin/api/healthchecks/run")
+async def admin_run_healthchecks(request: Request, _: bool = Depends(require_admin)):
+    body = await request.json() if request.method == "POST" else {}
+    results = await run_healthchecks(model_identifier=body.get("model_id") or body.get("model"), api_key_identifier=body.get("api_key_id") or body.get("key_id"), prompt=body.get("prompt"))
+    return {"items": results, "results": results}
+@app.get("/admin/api/settings")
+async def admin_settings(_: bool = Depends(require_admin)):
+    conn = get_db_connection()
+    try:
+        return get_settings_payload(conn)
+    finally:
+        conn.close()
+@app.put("/admin/api/settings")
+async def admin_update_settings(request: Request, _: bool = Depends(require_admin)):
+    body = await request.json()
+    conn = get_db_connection()
+    try:
+        set_setting(conn, "healthcheck_enabled", "true" if body.get("healthcheck_enabled", True) else "false")
+        set_setting(conn, "healthcheck_interval_minutes", str(max(5, int(body.get("healthcheck_interval_minutes", DEFAULT_HEALTH_INTERVAL_MINUTES)))))
+        set_setting(conn, "healthcheck_prompt", body.get("healthcheck_prompt") or DEFAULT_HEALTH_PROMPT)
+        if body.get("public_history_hours"):
+            set_setting(conn, "public_history_hours", str(max(1, int(body.get("public_history_hours")))))
+        conn.commit()
+    finally:
+        conn.close()
+    schedule_healthchecks()
+    conn = get_db_connection()
+    try:
+        return get_settings_payload(conn)
+    finally:
+        conn.close()

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+fastapi>=0.116.0,<1.0.0
+uvicorn[standard]>=0.35.0,<1.0.0
+httpx>=0.28.1,<1.0.0
+apscheduler>=3.10.4,<4.0.0
+python-multipart>=0.0.20,<1.0.0
+itsdangerous>=2.2.0,<3.0.0

static/admin.html ADDED Viewed

	@@ -0,0 +1,141 @@

+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <title>Admin - NVIDIA NIM Operations</title>
+    <link rel="preconnect" href="https://fonts.googleapis.com" />
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
+    <link
+      href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&display=swap"
+      rel="stylesheet"
+    />
+    <link rel="stylesheet" href="/static/style.css" />
+  </head>
+  <body>
+    <div class="admin-shell">
+      <aside class="admin-sidebar">
+        <h3>Sections</h3>
+        <button class="sidebar-btn active" data-panel="overview">Overview</button>
+        <button class="sidebar-btn" data-panel="models">Models</button>
+        <button class="sidebar-btn" data-panel="keys">API Keys</button>
+        <button class="sidebar-btn" data-panel="health">Health Checks</button>
+        <button class="sidebar-btn" data-panel="settings">Settings</button>
+      </aside>
+      <section class="admin-content">
+        <div class="glass-panel" data-panel="overview">
+          <h2>Command center</h2>
+          <div class="section-grid" id="overview-metrics"></div>
+          <div class="glass-panel" style="margin-top: 1rem;">
+            <h3>Recent checks</h3>
+            <table class="table">
+              <thead>
+                <tr>
+                  <th>Time</th>
+                  <th>Model</th>
+                  <th>Status</th>
+                  <th>Latency</th>
+                </tr>
+              </thead>
+              <tbody id="recent-checks"></tbody>
+            </table>
+          </div>
+        </div>
+        <div class="glass-panel hidden" data-panel="models">
+          <div class="section-grid compact-grid">
+            <div class="metric-card">
+              <h3>Total models</h3>
+              <strong id="model-count">-</strong>
+            </div>
+            <div class="metric-card">
+              <h3>Healthy</h3>
+              <strong id="model-healthy">-</strong>
+            </div>
+          </div>
+          <div class="form-grid" style="margin-top: 1rem;">
+            <input id="model-id" placeholder="Model ID (e.g. z-ai/glm5)" />
+            <input id="model-display-name" placeholder="Display name" />
+            <textarea id="model-description" placeholder="Description for the admin catalog"></textarea>
+            <button id="model-add" type="button">Add or update model</button>
+          </div>
+          <table class="table" style="margin-top: 1rem;">
+            <thead>
+              <tr>
+                <th>Model</th>
+                <th>Status</th>
+                <th>Requests</th>
+                <th>Health</th>
+                <th>Actions</th>
+              </tr>
+            </thead>
+            <tbody id="model-table"></tbody>
+          </table>
+        </div>
+        <div class="glass-panel hidden" data-panel="keys">
+          <h3>API Keys</h3>
+          <div class="form-grid compact-grid">
+            <input id="key-label" placeholder="Key label" />
+            <input id="key-value" placeholder="NVIDIA NIM key" />
+            <button id="key-add" type="button">Store key</button>
+          </div>
+          <table class="table" style="margin-top: 1rem;">
+            <thead>
+              <tr>
+                <th>Label</th>
+                <th>Masked</th>
+                <th>Requests</th>
+                <th>Last tested</th>
+                <th>Status</th>
+                <th>Actions</th>
+              </tr>
+            </thead>
+            <tbody id="key-table"></tbody>
+          </table>
+        </div>
+        <div class="glass-panel hidden" data-panel="health">
+          <div class="toolbar-row">
+            <div>
+              <h3>Health checks</h3>
+              <p class="status-text">Manual runs are stored and surfaced on the public board hour by hour.</p>
+            </div>
+            <button id="run-healthcheck" type="button">Run checks now</button>
+          </div>
+          <div class="section-grid" id="health-grid"></div>
+        </div>
+        <div class="glass-panel hidden" data-panel="settings">
+          <h3>Scheduler settings</h3>
+          <div class="form-grid">
+            <label class="checkbox-row">
+              <input id="healthcheck-enabled" type="checkbox" />
+              <span>Enable scheduled health checks</span>
+            </label>
+            <input id="healthcheck-interval" type="number" min="5" step="5" placeholder="Interval in minutes" />
+            <input id="public-history-hours" type="number" min="1" step="1" placeholder="Public history hours" />
+            <textarea id="healthcheck-prompt" placeholder="Prompt used for hourly health checks"></textarea>
+            <div class="inline-actions">
+              <button id="settings-save" type="button">Save settings</button>
+              <button class="secondary-btn" id="refresh-now" type="button">Reload dashboard</button>
+            </div>
+          </div>
+          <p class="status-text" id="settings-status"></p>
+        </div>
+      </section>
+    </div>
+    <div class="login-overlay" id="login-overlay">
+      <div class="login-card">
+        <h2>Admin login</h2>
+        <p class="status-text">Enter the PASSWORD environment variable to continue.</p>
+        <label for="admin-password">Password</label>
+        <input type="password" id="admin-password" autocomplete="current-password" />
+        <button id="login-btn">Unlock dashboard</button>
+        <p class="status-text" id="login-status"></p>
+      </div>
+    </div>
+    <script src="/static/admin.js" defer></script>
+  </body>
+</html>

static/admin.js ADDED Viewed

	@@ -0,0 +1,273 @@

+const PANEL_ATTR = "data-panel";
+const sidebarButtons = document.querySelectorAll(".sidebar-btn");
+const panels = document.querySelectorAll(`.glass-panel[${PANEL_ATTR}]`);
+const loginOverlay = document.getElementById("login-overlay");
+const loginBtn = document.getElementById("login-btn");
+const loginStatus = document.getElementById("login-status");
+const overviewMetrics = document.getElementById("overview-metrics");
+const recentChecks = document.getElementById("recent-checks");
+const modelTable = document.getElementById("model-table");
+const keyTable = document.getElementById("key-table");
+const healthGrid = document.getElementById("health-grid");
+const modelCount = document.getElementById("model-count");
+const modelHealthy = document.getElementById("model-healthy");
+const settingsStatus = document.getElementById("settings-status");
+const state = {
+  token: sessionStorage.getItem("nim_token"),
+  panel: "overview",
+};
+const showPanel = (name) => {
+  panels.forEach((panel) => panel.classList.toggle("hidden", panel.getAttribute(PANEL_ATTR) !== name));
+  sidebarButtons.forEach((button) => button.classList.toggle("active", button.dataset.panel === name));
+  state.panel = name;
+};
+sidebarButtons.forEach((button) => button.addEventListener("click", () => showPanel(button.dataset.panel)));
+const apiRequest = async (endpoint, opts = {}) => {
+  const headers = { "Content-Type": "application/json" };
+  if (state.token) headers.Authorization = `Bearer ${state.token}`;
+  const response = await fetch(`/admin/api/${endpoint}`, { ...opts, headers: { ...headers, ...(opts.headers || {}) } });
+  if (!response.ok) {
+    const payload = await response.json().catch(() => ({}));
+    throw new Error(payload.message || payload.detail || payload.error?.message || "Request failed");
+  }
+  return response.json();
+};
+const metricCard = ({ label, value }) => {
+  const div = document.createElement("div");
+  div.className = "metric-card";
+  div.innerHTML = `<h3>${label}</h3><strong>${value}</strong>`;
+  return div;
+};
+const pill = (status) => `<span class="pill">${status || "unknown"}</span>`;
+async function renderOverview() {
+  const payload = await apiRequest("overview");
+  overviewMetrics.innerHTML = "";
+  (payload.metrics || []).forEach((metric) => overviewMetrics.appendChild(metricCard(metric)));
+  recentChecks.innerHTML = "";
+  (payload.recent_checks || []).forEach((check) => {
+    const row = document.createElement("tr");
+    row.innerHTML = `
+      <td>${new Date(check.time).toLocaleString()}</td>
+      <td>${check.model}</td>
+      <td>${pill(check.status)}</td>
+      <td>${check.latency ? `${check.latency} ms` : "-"}</td>
+    `;
+    recentChecks.appendChild(row);
+  });
+}
+async function renderModels() {
+  const payload = await apiRequest("models");
+  const items = payload.items || [];
+  modelCount.textContent = items.length;
+  modelHealthy.textContent = items.filter((item) => item.status === "healthy").length;
+  modelTable.innerHTML = "";
+  items.forEach((item) => {
+    const row = document.createElement("tr");
+    row.innerHTML = `
+      <td>
+        <strong>${item.display_name || item.model_id}</strong><br />
+        <span class="status-text">${item.model_id}</span>
+      </td>
+      <td>${pill(item.status)}</td>
+      <td>${item.request_count}</td>
+      <td>${item.healthcheck_success_count}/${item.healthcheck_count}</td>
+      <td>
+        <div class="inline-actions">
+          <button class="secondary-btn" data-action="test-model" data-id="${item.model_id}">Test</button>
+          <button class="secondary-btn" data-action="remove-model" data-id="${item.model_id}">Remove</button>
+        </div>
+      </td>
+    `;
+    modelTable.appendChild(row);
+  });
+}
+async function renderKeys() {
+  const payload = await apiRequest("keys");
+  const items = payload.items || [];
+  keyTable.innerHTML = "";
+  items.forEach((item) => {
+    const row = document.createElement("tr");
+    row.innerHTML = `
+      <td>${item.label}</td>
+      <td>${item.masked_key}</td>
+      <td>${item.request_count}</td>
+      <td>${item.last_tested ? new Date(item.last_tested).toLocaleString() : "-"}</td>
+      <td>${pill(item.status)}</td>
+      <td>
+        <div class="inline-actions">
+          <button class="secondary-btn" data-action="test-key" data-id="${item.name}">Test</button>
+          <button class="secondary-btn" data-action="remove-key" data-id="${item.name}">Delete</button>
+        </div>
+      </td>
+    `;
+    keyTable.appendChild(row);
+  });
+}
+async function renderHealth() {
+  const payload = await apiRequest("healthchecks");
+  healthGrid.innerHTML = "";
+  (payload.items || []).slice(0, 12).forEach((item) => {
+    const card = document.createElement("div");
+    card.className = "glass-panel";
+    card.innerHTML = `
+      <div class="toolbar-row">
+        <h4>${item.model}</h4>
+        ${pill(item.status)}
+      </div>
+      <p class="status-text">${item.detail || "No detail"}</p>
+      <div class="health-meta">
+        <span>${item.api_key || "No key recorded"}</span>
+        <span>${item.latency ? `${item.latency} ms` : "-"}</span>
+        <span>${item.checked_at ? new Date(item.checked_at).toLocaleString() : "-"}</span>
+      </div>
+    `;
+    healthGrid.appendChild(card);
+  });
+}
+async function renderSettings() {
+  const payload = await apiRequest("settings");
+  document.getElementById("healthcheck-enabled").checked = Boolean(payload.healthcheck_enabled);
+  document.getElementById("healthcheck-interval").value = payload.healthcheck_interval_minutes || 60;
+  document.getElementById("public-history-hours").value = payload.public_history_hours || 48;
+  document.getElementById("healthcheck-prompt").value = payload.healthcheck_prompt || "Reply with the single word OK.";
+}
+async function loadAll() {
+  await Promise.all([renderOverview(), renderModels(), renderKeys(), renderHealth(), renderSettings()]);
+}
+async function testModel(modelId) {
+  const payload = await apiRequest(`models/${encodeURIComponent(modelId)}/test`, { method: "POST", body: JSON.stringify({}) });
+  alert(`${payload.display_name || payload.model} -> ${payload.status}`);
+  await loadAll();
+}
+async function removeModel(modelId) {
+  await apiRequest("models/remove", { method: "POST", body: JSON.stringify({ value: modelId }) });
+  await loadAll();
+}
+async function testKey(keyName) {
+  const payload = await apiRequest("keys/test", { method: "POST", body: JSON.stringify({ value: keyName }) });
+  alert(`${payload.api_key} -> ${payload.status}`);
+  await loadAll();
+}
+async function removeKey(keyName) {
+  await apiRequest("keys/remove", { method: "POST", body: JSON.stringify({ value: keyName }) });
+  await loadAll();
+}
+modelTable.addEventListener("click", (event) => {
+  const button = event.target.closest("button[data-action]");
+  if (!button) return;
+  if (button.dataset.action === "test-model") testModel(button.dataset.id);
+  if (button.dataset.action === "remove-model") removeModel(button.dataset.id);
+});
+keyTable.addEventListener("click", (event) => {
+  const button = event.target.closest("button[data-action]");
+  if (!button) return;
+  if (button.dataset.action === "test-key") testKey(button.dataset.id);
+  if (button.dataset.action === "remove-key") removeKey(button.dataset.id);
+});
+document.getElementById("model-add")?.addEventListener("click", async () => {
+  const modelId = document.getElementById("model-id").value.trim();
+  const displayName = document.getElementById("model-display-name").value.trim();
+  const description = document.getElementById("model-description").value.trim();
+  if (!modelId) {
+    alert("Model ID is required.");
+    return;
+  }
+  await apiRequest("models", { method: "POST", body: JSON.stringify({ model_id: modelId, display_name: displayName || modelId, description }) });
+  document.getElementById("model-id").value = "";
+  document.getElementById("model-display-name").value = "";
+  document.getElementById("model-description").value = "";
+  await renderModels();
+});
+document.getElementById("key-add")?.addEventListener("click", async () => {
+  const name = document.getElementById("key-label").value.trim();
+  const apiKey = document.getElementById("key-value").value.trim();
+  if (!name || !apiKey) {
+    alert("Label and key are required.");
+    return;
+  }
+  await apiRequest("keys", { method: "POST", body: JSON.stringify({ name, api_key: apiKey }) });
+  document.getElementById("key-label").value = "";
+  document.getElementById("key-value").value = "";
+  await renderKeys();
+});
+document.getElementById("run-healthcheck")?.addEventListener("click", async () => {
+  await apiRequest("healthchecks/run", { method: "POST", body: JSON.stringify({}) });
+  await loadAll();
+});
+document.getElementById("settings-save")?.addEventListener("click", async () => {
+  try {
+    const payload = {
+      healthcheck_enabled: document.getElementById("healthcheck-enabled").checked,
+      healthcheck_interval_minutes: Number(document.getElementById("healthcheck-interval").value || 60),
+      public_history_hours: Number(document.getElementById("public-history-hours").value || 48),
+      healthcheck_prompt: document.getElementById("healthcheck-prompt").value.trim(),
+    };
+    await apiRequest("settings", { method: "PUT", body: JSON.stringify(payload) });
+    settingsStatus.textContent = "Settings saved.";
+    await loadAll();
+  } catch (error) {
+    settingsStatus.textContent = error.message;
+  }
+});
+document.getElementById("refresh-now")?.addEventListener("click", loadAll);
+loginBtn.addEventListener("click", async () => {
+  const password = document.getElementById("admin-password").value.trim();
+  if (!password) {
+    loginStatus.textContent = "Enter a password to continue.";
+    return;
+  }
+  try {
+    loginStatus.textContent = "Authenticating...";
+    const response = await fetch("/admin/api/login", {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify({ password }),
+    });
+    const payload = await response.json().catch(() => ({}));
+    if (!response.ok) throw new Error(payload.detail || payload.message || "Invalid password");
+    state.token = payload.access_token || payload.token;
+    sessionStorage.setItem("nim_token", state.token);
+    loginOverlay.classList.add("hidden");
+    await loadAll();
+  } catch (error) {
+    loginStatus.textContent = error.message;
+  }
+});
+window.addEventListener("DOMContentLoaded", async () => {
+  showPanel(state.panel);
+  if (!state.token) return;
+  loginOverlay.classList.add("hidden");
+  try {
+    await loadAll();
+    setInterval(loadAll, 90 * 1000);
+  } catch (error) {
+    sessionStorage.removeItem("nim_token");
+    loginOverlay.classList.remove("hidden");
+  }
+});

static/index.html ADDED Viewed

	@@ -0,0 +1,42 @@

+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <title>Model Health �� NVIDIA NIM</title>
+    <link rel="preconnect" href="https://fonts.googleapis.com" />
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
+    <link
+      href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&display=swap"
+      rel="stylesheet"
+    />
+    <link rel="stylesheet" href="/static/style.css" />
+  </head>
+  <body>
+    <main class="app-shell">
+      <section class="glass-panel">
+        <div class="hero">
+          <div>
+            <p class="chip chip--healthy">Live metrics</p>
+            <h1>Model health, hour by hour</h1>
+            <p>
+              Each hour block shows whether the model responded with a healthy,
+              intermittent, or degraded signal. We poll NVIDIA NIM to keep the
+              grid in sync.
+            </p>
+          </div>
+          <div class="chip-list" id="summary-chips"></div>
+        </div>
+      </section>
+      <section class="glass-panel">
+        <div class="status-line">
+          <strong>Heat map</strong>
+          <span id="last-updated">��</span>
+        </div>
+        <div class="hour-grid" id="model-grid"></div>
+        <p class="status-text" id="error-text"></p>
+      </section>
+    </main>
+    <script src="/static/public.js" defer></script>
+  </body>
+</html>

static/public.js ADDED Viewed

	@@ -0,0 +1,93 @@

+const summaryChips = document.getElementById("summary-chips");
+const modelGrid = document.getElementById("model-grid");
+const lastUpdated = document.getElementById("last-updated");
+const errorText = document.getElementById("error-text");
+const statusStyles = {
+  healthy: "ok",
+  degraded: "warn",
+  down: "down",
+  unknown: "warn",
+};
+function formatHourSegment(segment) {
+  const span = document.createElement("span");
+  span.textContent = new Date(segment.time).getHours();
+  span.classList.add(statusStyles[segment.status] || "warn");
+  span.title = `${segment.status} �� ${new Date(segment.time).toLocaleTimeString()} `;
+  return span;
+}
+function renderModel(model) {
+  const card = document.createElement("article");
+  card.className = "model-card";
+  card.innerHTML = `
+    <div class="health-meta">
+      <span class="pill">${model.status || "unknown"}</span>
+      <span>Beat: ${model.beat || "��"}</span>
+    </div>
+    <h2>${model.name}</h2>
+    <small>${model.endpoint || "NIM chat"}</small>
+  `.trim();
+  const timeline = document.createElement("div");
+  timeline.className = "timeline";
+  (model.hourly || [])
+    .slice(-12)
+    .forEach((segment) => timeline.appendChild(formatHourSegment(segment)));
+  card.appendChild(timeline);
+  return card;
+}
+function renderSummary(models) {
+  summaryChips.innerHTML = "";
+  const total = models.length;
+  const healthy = models.filter((m) => m.status === "healthy").length;
+  const open = models.filter((m) => m.status === "down").length;
+  [
+    { label: `Monitored models`, value: total },
+    { label: `Healthy`, value: healthy },
+    { label: `Issues`, value: open },
+  ].forEach((metric) => {
+    const chip = document.createElement("span");
+    chip.className = "chip";
+    chip.textContent = `${metric.label}: ${metric.value}`;
+    if (metric.label === "Issues" && metric.value > 0) {
+      chip.style.borderColor = "#ff5f6d";
+      chip.style.color = "#ffb3a6";
+    }
+    summaryChips.appendChild(chip);
+  });
+}
+async function loadHealth() {
+  try {
+    errorText.textContent = "";
+    const response = await fetch("/api/health/public");
+    if (!response.ok) {
+      throw new Error("Health endpoint unavailable");
+    }
+    const payload = await response.json();
+    const models = payload.models || [];
+    renderSummary(models);
+    modelGrid.innerHTML = "";
+    models.forEach((model) => modelGrid.appendChild(renderModel(model)));
+    lastUpdated.textContent = payload.last_updated
+      ? new Date(payload.last_updated).toLocaleString()
+      : new Date().toLocaleString();
+  } catch (err) {
+    errorText.textContent = "Unable to reach NVIDIA NIM. Please check your keys.";
+    lastUpdated.textContent = "��";
+  }
+}
+window.addEventListener("DOMContentLoaded", () => {
+  loadHealth();
+  setInterval(loadHealth, 60 * 1000);
+});

static/style.css ADDED Viewed

	@@ -0,0 +1,455 @@

+:root {
+  --base-bg: #030711;
+  --panel-bg: rgba(7, 18, 34, 0.87);
+  --accent: #00f18d;
+  --accent-strong: #32ffd3;
+  --muted: #8ca3c5;
+  --border: rgba(255, 255, 255, 0.12);
+  --glow: 0 10px 40px rgba(0, 241, 141, 0.25);
+  --font-sans: "Space Grotesk", "Titillium Web", "Segoe UI", sans-serif;
+  color-scheme: dark;
+}
+* {
+  box-sizing: border-box;
+}
+body {
+  margin: 0;
+  font-family: var(--font-sans);
+  background: radial-gradient(circle at top right, rgba(0, 241, 141, 0.18), transparent 40%),
+    linear-gradient(180deg, #050a15 0%, #020408 50%, #030711 100%);
+  color: #f1f6ff;
+  min-height: 100vh;
+}
+.app-shell {
+  padding: 2rem;
+  max-width: 1200px;
+  margin: 0 auto;
+}
+.glass-panel {
+  background: var(--panel-bg);
+  border: 1px solid var(--border);
+  padding: 1.5rem;
+  border-radius: 18px;
+  box-shadow: var(--glow);
+  backdrop-filter: blur(16px);
+  margin-bottom: 1.75rem;
+}
+.hero {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 1rem;
+  align-items: center;
+  justify-content: space-between;
+}
+.hero h1 {
+  font-size: clamp(2rem, 1.8vw + 2rem, 3rem);
+  margin: 0;
+  line-height: 1.2;
+}
+.hero p {
+  color: var(--muted);
+  max-width: 540px;
+  margin: 0.5rem 0 0;
+  font-size: 1rem;
+}
+.chip-list {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 0.5rem;
+  margin-top: 1rem;
+}
+.chip {
+  padding: 0.3rem 0.9rem;
+  border-radius: 999px;
+  border: 1px solid rgba(255, 255, 255, 0.15);
+  font-size: 0.9rem;
+  color: #b1c2dd;
+}
+.chip--healthy {
+  border-color: rgba(0, 241, 141, 0.5);
+  color: var(--accent-strong);
+}
+.hour-grid {
+  display: grid;
+  grid-template-columns: repeat(auto-fit, minmax(240px, 1fr));
+  gap: 1rem;
+}
+.model-card {
+  padding: 1.25rem;
+  border-radius: 16px;
+  background: linear-gradient(135deg, rgba(255, 255, 255, 0.02), rgba(255, 255, 255, 0.04));
+  border: 1px solid transparent;
+  transition: border 0.3s ease, transform 0.3s ease;
+}
+.model-card:hover {
+  transform: translateY(-6px);
+  border-color: rgba(0, 241, 141, 0.6);
+}
+.model-card h2 {
+  margin: 0;
+  font-size: 1.25rem;
+}
+.model-card small {
+  color: var(--muted);
+}
+.timeline {
+  display: flex;
+  align-items: center;
+  gap: 0.3rem;
+  margin-top: 0.9rem;
+  flex-wrap: wrap;
+}
+.timeline span {
+  width: 28px;
+  height: 28px;
+  border-radius: 8px;
+  background: rgba(255, 255, 255, 0.05);
+  display: inline-flex;
+  align-items: center;
+  justify-content: center;
+  font-size: 0.7rem;
+  font-weight: 600;
+}
+.timeline span.ok {
+  background: linear-gradient(120deg, #00d97a, #00f18d);
+  box-shadow: 0 6px 12px rgba(0, 241, 141, 0.4);
+}
+.timeline span.warn {
+  background: linear-gradient(120deg, #ff9a56, #ffa63a);
+}
+.timeline span.down {
+  background: linear-gradient(120deg, #ff5f6d, #ffc371);
+}
+.status-line {
+  margin-top: 1rem;
+  display: flex;
+  justify-content: space-between;
+  font-size: 0.9rem;
+  color: var(--muted);
+  align-items: center;
+}
+.status-line strong {
+  color: #fff;
+}
+.health-meta {
+  display: flex;
+  gap: 0.75rem;
+  flex-wrap: wrap;
+  align-items: center;
+  margin-top: 0.5rem;
+  color: var(--muted);
+}
+.pulse {
+  width: 8px;
+  height: 8px;
+  border-radius: 50%;
+  background: var(--accent);
+  animation: pulse 1.6s infinite;
+}
+@keyframes pulse {
+  0% {
+    box-shadow: 0 0 0 0 rgba(0, 241, 141, 0.6);
+  }
+  70% {
+    box-shadow: 0 0 0 12px rgba(0, 241, 141, 0);
+  }
+  100% {
+    box-shadow: 0 0 0 0 rgba(0, 241, 141, 0);
+  }
+}
+button {
+  font-family: var(--font-sans);
+  border: none;
+  cursor: pointer;
+  border-radius: 999px;
+  padding: 0.65rem 1.2rem;
+  background: linear-gradient(120deg, #16a085, #00f18d);
+  color: #020408;
+  font-weight: 600;
+  transition: transform 0.2s ease;
+}
+button:hover {
+  transform: translateY(-2px);
+}
+.admin-shell {
+  display: grid;
+  grid-template-columns: 260px 1fr;
+  min-height: 100vh;
+}
+.admin-sidebar {
+  background: rgba(3, 7, 17, 0.9);
+  border-right: 1px solid rgba(255, 255, 255, 0.06);
+  padding: 2rem 1.5rem;
+  display: flex;
+  flex-direction: column;
+  gap: 0.75rem;
+}
+.admin-sidebar h3 {
+  margin: 0 0 1rem;
+  font-size: 1rem;
+  letter-spacing: 0.2em;
+  text-transform: uppercase;
+  color: var(--muted);
+}
+.admin-sidebar button {
+  width: 100%;
+  justify-content: flex-start;
+  background: transparent;
+  border-radius: 12px;
+  border: 1px solid rgba(255, 255, 255, 0.1);
+  color: #fff;
+  padding-left: 0.9rem;
+  text-align: left;
+  letter-spacing: 0.05em;
+}
+.admin-sidebar button.active {
+  border-color: var(--accent);
+  color: var(--accent);
+  box-shadow: var(--glow);
+}
+.admin-content {
+  padding: 2rem;
+  background: linear-gradient(180deg, rgba(4, 6, 15, 0.9), rgba(2, 3, 6, 0.95));
+}
+.login-overlay {
+  position: fixed;
+  inset: 0;
+  background: rgba(2, 3, 6, 0.8);
+  display: flex;
+  align-items: center;
+  justify-content: center;
+  z-index: 10;
+}
+.login-card {
+  width: min(400px, 90vw);
+  padding: 2rem;
+  background: var(--panel-bg);
+  border-radius: 22px;
+  border: 1px solid var(--border);
+  box-shadow: var(--glow);
+}
+.login-card h2 {
+  margin-top: 0;
+  letter-spacing: 0.08em;
+}
+.login-card label {
+  display: block;
+  font-size: 0.85rem;
+  text-transform: uppercase;
+  margin-bottom: 0.25rem;
+  color: var(--muted);
+  letter-spacing: 0.2em;
+}
+.login-card input {
+  width: 100%;
+  padding: 0.9rem;
+  border-radius: 12px;
+  border: 1px solid rgba(255, 255, 255, 0.15);
+  background: rgba(255, 255, 255, 0.03);
+  color: #fff;
+  margin-bottom: 1rem;
+  font-size: 1rem;
+}
+.section-grid {
+  display: grid;
+  grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
+  gap: 1.25rem;
+}
+.metric-card {
+  background: rgba(255, 255, 255, 0.03);
+  border-radius: 16px;
+  padding: 1rem;
+  border: 1px solid rgba(255, 255, 255, 0.06);
+}
+.metric-card h3 {
+  margin: 0;
+  font-size: 1.1rem;
+}
+.metric-card strong {
+  font-size: 2rem;
+  display: block;
+  margin-top: 0.5rem;
+}
+.table {
+  width: 100%;
+  border-collapse: separate;
+  border-spacing: 0;
+}
+.table thead th {
+  text-align: left;
+  font-size: 0.85rem;
+  text-transform: uppercase;
+  color: var(--muted);
+  padding-bottom: 0.5rem;
+  border-bottom: 1px solid rgba(255, 255, 255, 0.1);
+}
+.table tbody tr {
+  border-bottom: 1px solid rgba(255, 255, 255, 0.05);
+}
+.table td {
+  padding: 0.75rem 0;
+}
+.inline-actions {
+  display: flex;
+  gap: 0.5rem;
+}
+.pill {
+  padding: 0.25rem 0.8rem;
+  border-radius: 999px;
+  border: 1px solid transparent;
+  font-size: 0.75rem;
+  letter-spacing: 0.1em;
+  text-transform: uppercase;
+  background: rgba(0, 241, 141, 0.1);
+  color: var(--accent);
+}
+.form-inline {
+  display: flex;
+  gap: 0.6rem;
+  flex-wrap: wrap;
+  margin-top: 0.5rem;
+}
+.form-inline input {
+  flex: 1;
+  min-width: 120px;
+  background: rgba(255, 255, 255, 0.03);
+  border: 1px solid rgba(255, 255, 255, 0.1);
+  border-radius: 12px;
+  padding: 0.75rem;
+  color: #fff;
+}
+.status-text {
+  font-size: 0.85rem;
+  color: var(--muted);
+}
+.secondary-btn {
+  border-radius: 12px;
+  padding: 0.55rem 1rem;
+  background: transparent;
+  border: 1px solid rgba(255, 255, 255, 0.25);
+  color: #fff;
+}
+.secondary-btn:hover {
+  border-color: var(--accent);
+  color: var(--accent);
+}
+@media (max-width: 768px) {
+  .admin-shell {
+    grid-template-columns: 1fr;
+  }
+  .admin-sidebar {
+    flex-direction: row;
+    overflow-x: auto;
+  }
+}
+.hidden { display: none !important; }
+.form-grid {
+  display: grid;
+  gap: 0.75rem;
+  grid-template-columns: repeat(2, minmax(0, 1fr));
+}
+.compact-grid {
+  grid-template-columns: repeat(auto-fit, minmax(220px, 1fr));
+}
+.form-grid textarea,
+.form-grid input {
+  width: 100%;
+  min-width: 0;
+  background: rgba(255, 255, 255, 0.03);
+  border: 1px solid rgba(255, 255, 255, 0.1);
+  border-radius: 12px;
+  padding: 0.85rem;
+  color: #fff;
+  font: inherit;
+}
+.form-grid textarea {
+  min-height: 110px;
+  grid-column: 1 / -1;
+  resize: vertical;
+}
+.toolbar-row {
+  display: flex;
+  justify-content: space-between;
+  gap: 1rem;
+  align-items: center;
+  flex-wrap: wrap;
+}
+.checkbox-row {
+  display: flex;
+  align-items: center;
+  gap: 0.75rem;
+  color: #fff;
+}
+.checkbox-row input {
+  width: 18px;
+  height: 18px;
+}
+@media (max-width: 768px) {
+  .form-grid {
+    grid-template-columns: 1fr;
+  }
+}