Graham Paasch
commited on
Commit
·
a29b2b1
1
Parent(s):
6161d19
Polish copy and UI labels for demo
Browse files- docs/DEMO_SCRIPT.md +45 -13
- docs/README.md +13 -0
- docs/VIDEO_SCRIPT.md +13 -0
- space/app.py +14 -12
docs/DEMO_SCRIPT.md
CHANGED
|
@@ -1,13 +1,45 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
## Persona & setup (10–15s)
|
| 2 |
+
|
| 3 |
+
“You’re an infra engineer bringing up a new AI data-center: hundreds of racks, ToR switches, and a spine–leaf fabric. You can’t afford to guess how risky each change is. This MCP server simulates those changes in our MAESTRO lab and gives you a risk score before you touch production.”
|
| 4 |
+
|
| 5 |
+
## Demo flow (60–90s)
|
| 6 |
+
|
| 7 |
+
1. **Open the Space / name the pieces (5–10s)**
|
| 8 |
+
- Show the HF Space.
|
| 9 |
+
- Say: “This Hugging Face Space is just the front-end. Underneath is an MCP server talking to our MAESTRO lab—a GNS3 fabric plus Ansible, Nornir, and pyATS for checks.”
|
| 10 |
+
|
| 11 |
+
2. **Scenario 1 – Low-risk VLAN staging (20–25s)**
|
| 12 |
+
- In the UI:
|
| 13 |
+
- Change type: VLAN
|
| 14 |
+
- Scenario: Stage VLAN on leaf pair (preset `leaf_tor_vlan_stage`)
|
| 15 |
+
- Click **Run Lightning simulation**.
|
| 16 |
+
- Narrate: “First scenario: staging a VLAN on a pair of leaf switches—no traffic swing yet. Lightning mode looks at MAESTRO’s health, the size of the change, and our expected blast radius, and gives us a risk score and explanation.”
|
| 17 |
+
- Point to:
|
| 18 |
+
- Low risk score (~14).
|
| 19 |
+
- Pre-checks: fabric healthy, no existing alarms.
|
| 20 |
+
- Post-checks: staged VLAN present, no new issues.
|
| 21 |
+
- One line: “On a healthy fabric, small, localized changes show up as low-risk with blast radius limited to a couple of leafs.”
|
| 22 |
+
|
| 23 |
+
3. **Scenario 2 – Riskier TOR uplink shutdown (25–30s)**
|
| 24 |
+
- Change type: Interface
|
| 25 |
+
- Scenario: Shutdown TOR uplink in redundant pair (preset `tor_uplink_shutdown`)
|
| 26 |
+
- Click **Run Lightning simulation**.
|
| 27 |
+
- Narrate: “Now a riskier scenario: shutting down one TOR uplink in a redundant pair. Same MCP call, same lab, but the risk model knows this can break redundancy.”
|
| 28 |
+
- Point to:
|
| 29 |
+
- Medium risk score (~50–60).
|
| 30 |
+
- Pre-checks: MAESTRO health OK.
|
| 31 |
+
- Post-checks: one adjacency lost, new alarm on the TOR; blast radius confined to one rack.
|
| 32 |
+
- One line: “Even though traffic stays up, the risk score jumps, and the explanation tells us exactly why: lost redundancy and new alarms.”
|
| 33 |
+
|
| 34 |
+
4. **Scenario 3 – BGP fabric neighbor add (20–25s)**
|
| 35 |
+
- Change type: BGP neighbor
|
| 36 |
+
- Scenario: Add fabric neighbor on leaf (preset `leaf_bgp_fabric_neighbor_add`)
|
| 37 |
+
- Click **Run Lightning simulation**.
|
| 38 |
+
- Narrate: “Finally, a control-plane change: adding a BGP fabric neighbor on a leaf. We treat BGP changes as inherently more sensitive, even when they succeed.”
|
| 39 |
+
- Point to:
|
| 40 |
+
- Medium-ish risk (~37).
|
| 41 |
+
- Checks: MAESTRO health OK, new neighbor established, no lost adjacencies.
|
| 42 |
+
- One line: “Control-plane changes start at a higher risk baseline, but you still get a quick pass/fail signal from the lab in a single MCP call.”
|
| 43 |
+
|
| 44 |
+
5. **Closing (10–15s)**
|
| 45 |
+
- Narrate: “From an agent’s perspective, this is just one MCP tool—`simulate_network_change`—that returns risk, blast radius, and explanation in seconds. From a human’s perspective, it’s a way to de-risk common changes in a 1.3-GW data-center rollout without ever hitting production.”
|
docs/README.md
CHANGED
|
@@ -1,5 +1,18 @@
|
|
| 1 |
# Network Change Simulator (NCS)
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
Lightning mode is built for the infra engineer standing up 1.3-GW-class AI campuses. With a single preset, it simulates a
|
| 4 |
network change in our MAESTRO lab, runs targeted pre- and post-change checks, and returns a risk score and explanation in
|
| 5 |
a few seconds via MCP and the Hugging Face Space UI. MCP provides the standardized interface for tools and agents,
|
|
|
|
| 1 |
# Network Change Simulator (NCS)
|
| 2 |
|
| 3 |
+
**Network Change Simulator (NCS)** is an MCP server and Hugging Face Space that lets an infra engineer safely trial network changes *before* touching production. It simulates VLAN / interface / BGP neighbor changes against a real MAESTRO lab (GNS3 + Ansible/Nornir/pyATS), runs targeted pre- and post-change health checks, and returns a 0–100 risk score with an explanation in a few seconds. Lightning mode gives a fast, MCP-driven risk assessment for common change presets; the full mode is reserved for deeper, slower validation later.
|
| 4 |
+
|
| 5 |
+
## Why this exists
|
| 6 |
+
|
| 7 |
+
**Who it’s for**
|
| 8 |
+
Infra and SRE engineers standing up or expanding AI/data-center campuses—hundreds of racks, ToR switches, and spine–leaf fabrics—who need to know “how risky is this change?” *before* they push configs.
|
| 9 |
+
|
| 10 |
+
**What Lightning does**
|
| 11 |
+
Lightning mode takes a change preset (VLAN, TOR uplink, BGP neighbor), queries MAESTRO’s health, combines that with the change magnitude, and returns a risk score (0–100) plus pre-/post-check summaries and blast radius text—all via a single MCP tool and a one-page Space UI.
|
| 12 |
+
|
| 13 |
+
**Why MCP + MAESTRO + HF Space**
|
| 14 |
+
MCP gives agents and tools a standardized way to request simulations and read risk, MAESTRO provides a realistic multi-vendor lab underneath, and the Hugging Face Space gives judges and operators a one-click way to see risk in seconds without touching production gear.
|
| 15 |
+
|
| 16 |
Lightning mode is built for the infra engineer standing up 1.3-GW-class AI campuses. With a single preset, it simulates a
|
| 17 |
network change in our MAESTRO lab, runs targeted pre- and post-change checks, and returns a risk score and explanation in
|
| 18 |
a few seconds via MCP and the Hugging Face Space UI. MCP provides the standardized interface for tools and agents,
|
docs/VIDEO_SCRIPT.md
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# NCS Video Script (Draft)
|
| 2 |
+
|
| 3 |
+
## Hook (10–15s)
|
| 4 |
+
“Modern AI data-centers are huge: hundreds of racks, thousands of ToR ports, and tight maintenance windows. One bad config push can take down an entire fabric. This project turns network changes into an MCP call that simulates the change in a lab and gives you a risk score in seconds—before you ever touch production.”
|
| 5 |
+
|
| 6 |
+
## Problem (30–40s)
|
| 7 |
+
“Imagine you’re the engineer responsible for standing up a 1.3-GW AI campus. Every maintenance window juggles VLAN adds, TOR uplink toggles, and BGP fabric updates. You need to know, fast, whether a change threatens core BGP, affects two racks or twenty, and whether MAESTRO (our lab) is already unhappy. Today you guess or build custom playbooks; tomorrow an MCP tool does it for you.”
|
| 8 |
+
|
| 9 |
+
## Solution / Demo (30–40s)
|
| 10 |
+
“Network Change Simulator is an MCP server plus this Hugging Face Space. Lightning mode takes a preset, runs MAESTRO health, simulates the change, and returns risk, pre/post checks, and blast radius. VLAN staging stays low risk, TOR shutdowns spike into medium, BGP neighbor adds sit midrange even when all checks pass. It’s a single MCP call whether you’re a human or an agent.”
|
| 11 |
+
|
| 12 |
+
## Closing (10–15s)
|
| 13 |
+
“Lightning mode is live today: one MCP tool, a handful of presets, and a real MAESTRO lab under the hood. Full mode will tackle slower, deeper validation, but the shape remains the same—MAESTRO for realism, MCP as the glue, Spaces for a one-click risk readout before you touch production.”
|
space/app.py
CHANGED
|
@@ -11,11 +11,11 @@ PRESETS = {
|
|
| 11 |
},
|
| 12 |
"Interface": {
|
| 13 |
"Enable TOR uplink": "tor_uplink_enable",
|
| 14 |
-
"
|
| 15 |
},
|
| 16 |
-
"BGP
|
| 17 |
-
"Add fabric neighbor": "leaf_bgp_fabric_neighbor_add",
|
| 18 |
-
"Remove fabric neighbor": "leaf_bgp_fabric_neighbor_remove",
|
| 19 |
},
|
| 20 |
}
|
| 21 |
API_URL = os.getenv("NCS_SERVER_URL", "http://localhost:8000/mcp/simulate")
|
|
@@ -48,9 +48,9 @@ def run_sim(change_type_label: str, scenario_label: str):
|
|
| 48 |
|
| 49 |
|
| 50 |
with gr.Blocks(title="Network Change Simulator") as demo:
|
| 51 |
-
gr.Markdown("# Network Change Simulator (Lightning
|
| 52 |
with gr.Row():
|
| 53 |
-
change_type = gr.Dropdown(choices=list(PRESETS.keys()), label="Change
|
| 54 |
scenario = gr.Dropdown(choices=list(PRESETS["VLAN"].keys()), label="Scenario")
|
| 55 |
|
| 56 |
def update_scenarios(ct):
|
|
@@ -58,14 +58,16 @@ with gr.Blocks(title="Network Change Simulator") as demo:
|
|
| 58 |
|
| 59 |
change_type.change(update_scenarios, inputs=change_type, outputs=scenario)
|
| 60 |
|
| 61 |
-
run_btn = gr.Button("Run Lightning
|
|
|
|
| 62 |
|
| 63 |
-
risk_score = gr.Number(label="Risk
|
| 64 |
-
risk_level = gr.Textbox(label="Risk
|
| 65 |
-
pre_checks = gr.Markdown(label="Pre-
|
| 66 |
-
post_checks = gr.Markdown(label="Post-
|
| 67 |
-
blast_radius = gr.Textbox(label="Blast
|
| 68 |
explanation = gr.Textbox(label="Explanation")
|
|
|
|
| 69 |
|
| 70 |
run_btn.click(
|
| 71 |
run_sim,
|
|
|
|
| 11 |
},
|
| 12 |
"Interface": {
|
| 13 |
"Enable TOR uplink": "tor_uplink_enable",
|
| 14 |
+
"Shutdown TOR uplink": "tor_uplink_shutdown",
|
| 15 |
},
|
| 16 |
+
"BGP neighbor": {
|
| 17 |
+
"Add fabric neighbor on leaf": "leaf_bgp_fabric_neighbor_add",
|
| 18 |
+
"Remove fabric neighbor on leaf": "leaf_bgp_fabric_neighbor_remove",
|
| 19 |
},
|
| 20 |
}
|
| 21 |
API_URL = os.getenv("NCS_SERVER_URL", "http://localhost:8000/mcp/simulate")
|
|
|
|
| 48 |
|
| 49 |
|
| 50 |
with gr.Blocks(title="Network Change Simulator") as demo:
|
| 51 |
+
gr.Markdown("# Network Change Simulator (Lightning mode)")
|
| 52 |
with gr.Row():
|
| 53 |
+
change_type = gr.Dropdown(choices=list(PRESETS.keys()), label="Change type", value="VLAN")
|
| 54 |
scenario = gr.Dropdown(choices=list(PRESETS["VLAN"].keys()), label="Scenario")
|
| 55 |
|
| 56 |
def update_scenarios(ct):
|
|
|
|
| 58 |
|
| 59 |
change_type.change(update_scenarios, inputs=change_type, outputs=scenario)
|
| 60 |
|
| 61 |
+
run_btn = gr.Button("Run Lightning simulation")
|
| 62 |
+
gr.Markdown("*Simulated in MAESTRO lab – no production impact.*")
|
| 63 |
|
| 64 |
+
risk_score = gr.Number(label="Risk score (0–100)")
|
| 65 |
+
risk_level = gr.Textbox(label="Risk level")
|
| 66 |
+
pre_checks = gr.Markdown(label="Pre-change checks")
|
| 67 |
+
post_checks = gr.Markdown(label="Post-change checks")
|
| 68 |
+
blast_radius = gr.Textbox(label="Blast radius")
|
| 69 |
explanation = gr.Textbox(label="Explanation")
|
| 70 |
+
gr.Markdown("**MCP Lightning: risk in seconds before you touch production.**")
|
| 71 |
|
| 72 |
run_btn.click(
|
| 73 |
run_sim,
|