Antigravity commited on
Commit
08c0cf7
·
1 Parent(s): 6949cb6

Add openenv-core dependency and server entry point

Browse files
.build-trigger CHANGED
@@ -1 +1 @@
1
- Final Release Sync (Definitive UI): 2026-04-07 23:38:07
 
1
+ Final Release Sync (Definitive UI): 2026-04-07 23:38:07
.dockerignore CHANGED
@@ -1,10 +1,10 @@
1
- .git/
2
- .env
3
- backend/venv/
4
- backend/__pycache__/
5
- frontend/node_modules/
6
- frontend/dist/
7
- .pytest_cache/
8
- .coverage
9
- brain/
10
- .gemini/
 
1
+ .git/
2
+ .env
3
+ backend/venv/
4
+ backend/__pycache__/
5
+ frontend/node_modules/
6
+ frontend/dist/
7
+ .pytest_cache/
8
+ .coverage
9
+ brain/
10
+ .gemini/
.gitignore CHANGED
@@ -1,37 +1,37 @@
1
- # Python
2
- __pycache__/
3
- *.py[cod]
4
- *$py.class
5
- *.so
6
- .Python
7
- backend/venv/
8
- .pytest_cache/
9
- .coverage
10
- .cache
11
- backend/scenarios/.cache
12
-
13
- # Node
14
- node_modules/
15
- .npm/
16
-
17
- # Env & Secrets
18
- .env
19
- .env.*
20
- !.env.example
21
- # default.env is needed for HF Spaces
22
-
23
- # OS
24
- .DS_Store
25
- Thumbs.db
26
-
27
- # VS Code / IDE
28
- .vscode/
29
- .idea/
30
- *.swp
31
- *.swo
32
-
33
- # Project specific
34
- backend/logs/
35
- *.log
36
- .gemini/
37
- brain/
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ backend/venv/
8
+ .pytest_cache/
9
+ .coverage
10
+ .cache
11
+ backend/scenarios/.cache
12
+
13
+ # Node
14
+ node_modules/
15
+ .npm/
16
+
17
+ # Env & Secrets
18
+ .env
19
+ .env.*
20
+ !.env.example
21
+ # default.env is needed for HF Spaces
22
+
23
+ # OS
24
+ .DS_Store
25
+ Thumbs.db
26
+
27
+ # VS Code / IDE
28
+ .vscode/
29
+ .idea/
30
+ *.swp
31
+ *.swo
32
+
33
+ # Project specific
34
+ backend/logs/
35
+ *.log
36
+ .gemini/
37
+ brain/
README.md CHANGED
@@ -1,292 +1,292 @@
1
- ---
2
- title: NEXON-AI
3
- emoji: 🛡️
4
- colorFrom: blue
5
- colorTo: indigo
6
- sdk: docker
7
- app_port: 7860
8
- pinned: false
9
- ---
10
-
11
- <!-- LAST_SYNC_VERIFICATION: 2026-04-08 00:07:00 -->
12
-
13
- # NEXUS-AI 🌐🛡️
14
- ### Autonomous Incident Investigation Dashboard
15
-
16
- <div align="center">
17
-
18
- ![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)
19
- ![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-009688?style=for-the-badge&logo=fastapi&logoColor=white)
20
- ![React](https://img.shields.io/badge/React-18.x-61DAFB?style=for-the-badge&logo=react&logoColor=black)
21
- ![Tailwind](https://img.shields.io/badge/Tailwind_CSS-3.x-38B2AC?style=for-the-badge&logo=tailwind-css&logoColor=white)
22
- ![Ollama](https://img.shields.io/badge/Ollama-Local_LLM-000000?style=for-the-badge&logo=ollama)
23
-
24
- **Status:** Active Simulation Pipeline
25
- **Architecture:** Real-time WebSockets + Multi-Agent Consensus
26
-
27
- </div>
28
-
29
- ---
30
-
31
- ## 📖 What is NEXUS-AI?
32
-
33
- NEXUS is a next-generation, autonomous dual-agent environment designed to investigate and validate software incidents in real-time. Using a combination of an **Investigator** and a **Validator** agent, NEXUS autonomously forms hypotheses, executes systems tools, evaluates system behavior, and reaches strict consensus on root causes.
34
-
35
- Traditional manual debugging requires extensive context-switching and tool fatigue. NEXUS solves this through:
36
- 1. **Dual-Agent Autonomy**: Two specialized models communicating word-by-word via WebSockets.
37
- 2. **Dynamic Tool Execution**: Fully integrated system terminals allowing agents to run sandboxed validation scripts.
38
- 3. **Semantic Reward Engine**: Evaluates conversational drift mathematically (using native GPU embeddings).
39
-
40
- The result: An AI "Incident Response Team" that navigates servers, traces logs, and fixes bugs identically to a human SRE.
41
-
42
- ---
43
-
44
- ## 🖼️ Application Screenshots
45
-
46
- ### 📊 Simulation Dashboard
47
-
48
- > The core command center. Features live agent terminals, a dual-communication consensus log, and a mathematical performance reward graph plotting investigation confidence.
49
-
50
- <div align="center">
51
- <img src="./assets/screenshots/Dashboard.png" alt="Simulation Dashboard" width="90%"/>
52
- </div>
53
-
54
- ---
55
-
56
- ## 🎛️ Scenario Registry & Core Settings
57
-
58
- > The system is architected for instant adaptability — seamlessly switch LLM providers and inject custom threat models entirely through the frontend DOM.
59
-
60
- <table>
61
- <tr>
62
- <td align="center" width="50%">
63
- <img src="./assets/screenshots/Scenarios.png" alt="Scenario Browser"/>
64
- <br/><b>Scenario Registry</b>
65
- <br/><sub>A persistent LocalStorage-backed grid of tactical simulations. Users can dynamically inject custom infrastructure-specific incidents directly into the agent pipeline.</sub>
66
- </td>
67
- <td align="center" width="50%">
68
- <img src="./assets/screenshots/Settings.png" alt="Hardware Configuration"/>
69
- <br/><b>Runtime Configuration</b>
70
- <br/><sub>Dynamically maps available locally-installed Ollama networks, allowing the user to pair models (e.g., Qwen vs Dolphin-Phi) with fully independent parameters.</sub>
71
- </td>
72
- </tr>
73
- </table>
74
-
75
- ---
76
-
77
- ## 🏗️ System Architecture
78
-
79
- ```text
80
- ┌─────────────────────────────────────────────────────────────────┐
81
- │ CLIENT BROWSER │
82
- │ React SPA (Tailwind + Framer Motion) │
83
- │ localhost:5173 │
84
- └───────────┬─────────────────────────────────┬───────────────────┘
85
- │ HTTP (REST) │ ws://
86
- ▼ ▼
87
- ┌─────────────────────────────────────────────────────────────────┐
88
- │ FASTAPI BACKEND (localhost:7860) │
89
- │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
90
- │ │ /config │ │/scenarios│ │ /reset │ │ ws:// Simulator │ │
91
- │ │ Env Sync │ │ DB Cache │ │ Injection│ │ Live Stream Sync│ │
92
- │ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │
93
- └───────────┬───────────────────────────────────┬─────────────────┘
94
- │ │
95
- ▼ ▼
96
- ┌─────────────────────────────────────────────────────────────────┐
97
- │ OLLAMA ENGINE / LLM PIPELINE │
98
- │ Agent A (Investigator) ◄──────► Agent B (Validator) │
99
- │ - Generates Hypotheses - Challenges Assertions │
100
- │ - Runs System Tools - Requires Proof │
101
- └─────────────────────────────────────────────────────────────────┘
102
- ```
103
-
104
- ---
105
-
106
- ## 🌐 Execution Environments
107
-
108
- NEXUS-AI supports two distinct execution models for agent tools, toggleable via the **Settings** dashboard:
109
-
110
- ### 1. Simulated Mode (Safe Sandbox)
111
- * **Default Mode**: Agents interact with a pre-defined `clue_map` within the scenario YAML.
112
- * **No System Impact**: Commands like `read_logs` or `check_service` return mocked data.
113
- * **Use Case**: Training, logic validation, and "what-if" analysis without infrastructure risk.
114
-
115
- ### 2. SSH Lab Node (Real-World Execution)
116
- * **Live Connection**: Commands are executed in real-time on a remote Linux server via SSH.
117
- * **Autonomous Terminal**: Agents use the `run_terminal_command` tool to browse logs, check systemd status, and inspect real configs.
118
- * **Security**: Includes a command blocklist to prevent highly destructive operations (e.g., `rm -rf /`).
119
- * **Use Case**: Actual incident response on isolated Lab/Staging nodes.
120
-
121
- ---
122
-
123
- ## 📐 OpenEnv Specification
124
-
125
- NEXUS-AI strictly adheres to the **OpenEnv 1.0** standard for agent-environment interaction.
126
-
127
- ### 🎮 Action Space
128
- The environment accepts a typed **NexusAction** (Text-based with structured tool calls).
129
- - **agent_id**: `string` ("agent_a" or "agent_b")
130
- - **message**: `string` (The natural language reasoning/communication)
131
- - **tool_calls**: `List[ToolCall]` (Optional structured calls like `TOOL: read_logs(file='app.log')`)
132
- - **confidence**: `float` (0.0 - 1.0)
133
-
134
- ### 🧐 Observation Space
135
- The environment returns a structured **NexusObservation** summarizing the system state.
136
- - **scenario_description**: `string` (High-level objective)
137
- - **scenario_context**: `string` (Background telemetry/environment info)
138
- - **partner_message**: `string` (The last message from the other agent)
139
- - **tool_results**: `List[ToolResult]` (Output of any executed system tools)
140
- - **clues_found**: `List[string]` (Accumulated evidence identified by the Reward Engine)
141
- - **investigation_stage**: `string` (`investigating`, `narrowing`, `found`, `verified`)
142
- - **round**: `integer` (Current episode round)
143
- - **available_tools**: `List[string]` (List of permitted tools for the current mode)
144
-
145
- ### 📝 Task Registry & Difficulty
146
- | Task Name | Difficulty | Objective | Grader Method |
147
- |---|---|---|---|
148
- | `software-incident` | **Easy** | Fix Nginx 503 rate-limit misconfiguration | State Check: `nginx-proxy.rate_limit` |
149
- | `business-process-failure` | **Medium** | Resolve inventory stockout logic error | State Check: `stock_threshold` + Red Herring Penalty |
150
- | `cascade-system-failure` | **Hard** | Fix Postgres connection exhaustion | Multi-Step: Query Termination + Config Update |
151
-
152
- ### 📈 Baseline Benchmarks
153
- Validated using `inference.py` (Phi-3-mini & Qwen2.5-1.5B).
154
- - **Software Incident**: 0.88 / 1.00
155
- - **Business Process Failure**: 0.72 / 1.00
156
- - **Cascade System Failure**: 0.48 / 1.00
157
-
158
- ---
159
-
160
- ## 🧠 The AI Pipeline Deep-Dive
161
-
162
- ### Step 1: Scenario Injection & Bootstrapping
163
- ```python
164
- # The EpisodeManager receives the frontend custom scenario JSON
165
- # Broadcasts 'episode_start' natively over the WebSocket to synchronize the UI
166
- await broadcast("episode_start", {
167
- "scenario": active_scenario,
168
- "agent_a_model": settings.AGENT_A_MODEL
169
- })
170
- ```
171
-
172
- ### Step 2: Agent Consensus Loop
173
- ```python
174
- # Agents interact sequentially. The Investigator attempts a solution
175
- # while the Validator challenges it. Both agents have access to dynamic system execution.
176
- client, model_name = model_manager.get_client(agent_id)
177
- stream = await client.chat.completions.create(
178
- model=model_name,
179
- messages=injected_history,
180
- tools=available_tools, # e.g. fix_proposer, run_terminal_command
181
- stream=True
182
- )
183
- ```
184
-
185
- ### Step 3: Fast GPU Embeddings (Similarity Evaluation)
186
- ```python
187
- # Heavy CPU blocking is completely bypassed.
188
- # Semantic embedding computations map strictly into the Ollama GPU pipeline.
189
- @lru_cache(maxsize=256)
190
- def get_embedding(text: str) -> List[float]:
191
- response = httpx.post("http://localhost:11434/api/embeddings", json={
192
- "model": "all-minilm",
193
- "prompt": text
194
- }, timeout=60.0)
195
- return response.json().get("embedding", [])
196
- ```
197
-
198
- ---
199
-
200
- ## 🛠️ Full Technology Stack
201
-
202
- | Layer | Technology | Why |
203
- |---|---|---|
204
- | Frontend Framework | React 18 (Vite) | Lightning fast HMR, component isolation |
205
- | Frontend Styling | Tailwind CSS | Utility-first tactical glassmorphism |
206
- | Backend Framework | FastAPI | Async Python, explicit endpoint mapping |
207
- | Transport Layer | WebSockets | Word-by-word streaming across UI boundaries |
208
- | Local AI Engine | Ollama | Native device acceleration, absolute privacy |
209
- | Remote Provider | HuggingFace Inference API | Drop-in SaaS alternatives |
210
- | SSH Connectivity | Paramiko | Secure remote shell execution for Lab Nodes |
211
- | Data Persistence | LocalStorage & `.env` Injection | Avoids over-architected SQL constraints |
212
-
213
- ---
214
-
215
- ## 🚀 How to Run This Project (Full Step-by-Step Guide)
216
-
217
- ### 📋 Prerequisites
218
- - Python 3.10+
219
- - Node.js 18+
220
- - [Ollama](https://ollama.com/) (installed locally for model hosting)
221
- - **Optional**: A remote Linux VM (Ubuntu/Kali) with SSH enabled for Lab Node mode
222
-
223
- ---
224
-
225
- ### 1️⃣ Backend Setup (FastAPI / Python)
226
-
227
- ```bash
228
- cd backend
229
-
230
- # Create and activate virtual environment
231
- python -m venv venv
232
- # source venv/bin/activate # Linux/macOS
233
- venv\Scripts\activate # Windows
234
-
235
- # Install all dependencies
236
- pip install -r requirements.txt
237
- ```
238
-
239
- #### Start the Backend Engine
240
- ```bash
241
- # This exposes the core REST API and the WebSocket simulation tunnel
242
- python main.py
243
- ```
244
-
245
- ---
246
-
247
- ### 2️⃣ Frontend Setup (React)
248
-
249
- Open a **new terminal tab**:
250
-
251
- ```bash
252
- cd frontend
253
-
254
- # Install Node.js dependencies
255
- npm install
256
-
257
- # Start the Vite development server
258
- npm run dev
259
- ```
260
-
261
- The application is now fully accessible at [http://localhost:5173](http://localhost:5173).
262
-
263
- ---
264
-
265
- ### 3️⃣ Pulling Models
266
-
267
- To run the simulation locally without cloud API keys, you must ensure you pull suitable reasoning models through Ollama:
268
-
269
- ```bash
270
- ollama run qwen2.5:3b # Excellent validator logic footprint
271
- ollama run dolphin-llama3 # Uncensored investigative assertions
272
- ollama pull all-minilm # Mandatory for semantic similarity scoring
273
- ```
274
-
275
- ---
276
-
277
- ## 🧪 Automated Testing
278
- NEXUS-AI includes a comprehensive test suite to ensure environment stability and specification compliance.
279
-
280
- ```bash
281
- # Run the OpenEnv specification validator
282
- python openenv_validator.py
283
-
284
- # Run unit tests for core logic
285
- pip install pytest
286
- pytest tests/
287
- ```
288
-
289
- ---
290
-
291
- ## 🤝 Authors
292
- **Developed by: Ashish Menon** & Vector
 
1
+ ---
2
+ title: NEXON-AI
3
+ emoji: 🛡️
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ ---
10
+
11
+ <!-- LAST_SYNC_VERIFICATION: 2026-04-08 00:07:00 -->
12
+
13
+ # NEXUS-AI 🌐🛡️
14
+ ### Autonomous Incident Investigation Dashboard
15
+
16
+ <div align="center">
17
+
18
+ ![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)
19
+ ![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-009688?style=for-the-badge&logo=fastapi&logoColor=white)
20
+ ![React](https://img.shields.io/badge/React-18.x-61DAFB?style=for-the-badge&logo=react&logoColor=black)
21
+ ![Tailwind](https://img.shields.io/badge/Tailwind_CSS-3.x-38B2AC?style=for-the-badge&logo=tailwind-css&logoColor=white)
22
+ ![Ollama](https://img.shields.io/badge/Ollama-Local_LLM-000000?style=for-the-badge&logo=ollama)
23
+
24
+ **Status:** Active Simulation Pipeline
25
+ **Architecture:** Real-time WebSockets + Multi-Agent Consensus
26
+
27
+ </div>
28
+
29
+ ---
30
+
31
+ ## 📖 What is NEXUS-AI?
32
+
33
+ NEXUS is a next-generation, autonomous dual-agent environment designed to investigate and validate software incidents in real-time. Using a combination of an **Investigator** and a **Validator** agent, NEXUS autonomously forms hypotheses, executes systems tools, evaluates system behavior, and reaches strict consensus on root causes.
34
+
35
+ Traditional manual debugging requires extensive context-switching and tool fatigue. NEXUS solves this through:
36
+ 1. **Dual-Agent Autonomy**: Two specialized models communicating word-by-word via WebSockets.
37
+ 2. **Dynamic Tool Execution**: Fully integrated system terminals allowing agents to run sandboxed validation scripts.
38
+ 3. **Semantic Reward Engine**: Evaluates conversational drift mathematically (using native GPU embeddings).
39
+
40
+ The result: An AI "Incident Response Team" that navigates servers, traces logs, and fixes bugs identically to a human SRE.
41
+
42
+ ---
43
+
44
+ ## 🖼️ Application Screenshots
45
+
46
+ ### 📊 Simulation Dashboard
47
+
48
+ > The core command center. Features live agent terminals, a dual-communication consensus log, and a mathematical performance reward graph plotting investigation confidence.
49
+
50
+ <div align="center">
51
+ <img src="./assets/screenshots/Dashboard.png" alt="Simulation Dashboard" width="90%"/>
52
+ </div>
53
+
54
+ ---
55
+
56
+ ## 🎛️ Scenario Registry & Core Settings
57
+
58
+ > The system is architected for instant adaptability — seamlessly switch LLM providers and inject custom threat models entirely through the frontend DOM.
59
+
60
+ <table>
61
+ <tr>
62
+ <td align="center" width="50%">
63
+ <img src="./assets/screenshots/Scenarios.png" alt="Scenario Browser"/>
64
+ <br/><b>Scenario Registry</b>
65
+ <br/><sub>A persistent LocalStorage-backed grid of tactical simulations. Users can dynamically inject custom infrastructure-specific incidents directly into the agent pipeline.</sub>
66
+ </td>
67
+ <td align="center" width="50%">
68
+ <img src="./assets/screenshots/Settings.png" alt="Hardware Configuration"/>
69
+ <br/><b>Runtime Configuration</b>
70
+ <br/><sub>Dynamically maps available locally-installed Ollama networks, allowing the user to pair models (e.g., Qwen vs Dolphin-Phi) with fully independent parameters.</sub>
71
+ </td>
72
+ </tr>
73
+ </table>
74
+
75
+ ---
76
+
77
+ ## 🏗️ System Architecture
78
+
79
+ ```text
80
+ ┌─────────────────────────────────────────────────────────────────┐
81
+ │ CLIENT BROWSER │
82
+ │ React SPA (Tailwind + Framer Motion) │
83
+ │ localhost:5173 │
84
+ └───────────┬─────────────────────────────────┬───────────────────┘
85
+ │ HTTP (REST) │ ws://
86
+ ▼ ▼
87
+ ┌─────────────────────────────────────────────────────────────────┐
88
+ │ FASTAPI BACKEND (localhost:7860) │
89
+ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
90
+ │ │ /config │ │/scenarios│ │ /reset │ │ ws:// Simulator │ │
91
+ │ │ Env Sync │ │ DB Cache │ │ Injection│ │ Live Stream Sync│ │
92
+ │ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │
93
+ └───────────┬───────────────────────────────────┬─────────────────┘
94
+ │ │
95
+ ▼ ▼
96
+ ┌─────────────────────────────────────────────────────────────────┐
97
+ │ OLLAMA ENGINE / LLM PIPELINE │
98
+ │ Agent A (Investigator) ◄──────► Agent B (Validator) │
99
+ │ - Generates Hypotheses - Challenges Assertions │
100
+ │ - Runs System Tools - Requires Proof │
101
+ └─────────────────────────────────────────────────────────────────┘
102
+ ```
103
+
104
+ ---
105
+
106
+ ## 🌐 Execution Environments
107
+
108
+ NEXUS-AI supports two distinct execution models for agent tools, toggleable via the **Settings** dashboard:
109
+
110
+ ### 1. Simulated Mode (Safe Sandbox)
111
+ * **Default Mode**: Agents interact with a pre-defined `clue_map` within the scenario YAML.
112
+ * **No System Impact**: Commands like `read_logs` or `check_service` return mocked data.
113
+ * **Use Case**: Training, logic validation, and "what-if" analysis without infrastructure risk.
114
+
115
+ ### 2. SSH Lab Node (Real-World Execution)
116
+ * **Live Connection**: Commands are executed in real-time on a remote Linux server via SSH.
117
+ * **Autonomous Terminal**: Agents use the `run_terminal_command` tool to browse logs, check systemd status, and inspect real configs.
118
+ * **Security**: Includes a command blocklist to prevent highly destructive operations (e.g., `rm -rf /`).
119
+ * **Use Case**: Actual incident response on isolated Lab/Staging nodes.
120
+
121
+ ---
122
+
123
+ ## 📐 OpenEnv Specification
124
+
125
+ NEXUS-AI strictly adheres to the **OpenEnv 1.0** standard for agent-environment interaction.
126
+
127
+ ### 🎮 Action Space
128
+ The environment accepts a typed **NexusAction** (Text-based with structured tool calls).
129
+ - **agent_id**: `string` ("agent_a" or "agent_b")
130
+ - **message**: `string` (The natural language reasoning/communication)
131
+ - **tool_calls**: `List[ToolCall]` (Optional structured calls like `TOOL: read_logs(file='app.log')`)
132
+ - **confidence**: `float` (0.0 - 1.0)
133
+
134
+ ### 🧐 Observation Space
135
+ The environment returns a structured **NexusObservation** summarizing the system state.
136
+ - **scenario_description**: `string` (High-level objective)
137
+ - **scenario_context**: `string` (Background telemetry/environment info)
138
+ - **partner_message**: `string` (The last message from the other agent)
139
+ - **tool_results**: `List[ToolResult]` (Output of any executed system tools)
140
+ - **clues_found**: `List[string]` (Accumulated evidence identified by the Reward Engine)
141
+ - **investigation_stage**: `string` (`investigating`, `narrowing`, `found`, `verified`)
142
+ - **round**: `integer` (Current episode round)
143
+ - **available_tools**: `List[string]` (List of permitted tools for the current mode)
144
+
145
+ ### 📝 Task Registry & Difficulty
146
+ | Task Name | Difficulty | Objective | Grader Method |
147
+ |---|---|---|---|
148
+ | `software-incident` | **Easy** | Fix Nginx 503 rate-limit misconfiguration | State Check: `nginx-proxy.rate_limit` |
149
+ | `business-process-failure` | **Medium** | Resolve inventory stockout logic error | State Check: `stock_threshold` + Red Herring Penalty |
150
+ | `cascade-system-failure` | **Hard** | Fix Postgres connection exhaustion | Multi-Step: Query Termination + Config Update |
151
+
152
+ ### 📈 Baseline Benchmarks
153
+ Validated using `inference.py` (Phi-3-mini & Qwen2.5-1.5B).
154
+ - **Software Incident**: 0.88 / 1.00
155
+ - **Business Process Failure**: 0.72 / 1.00
156
+ - **Cascade System Failure**: 0.48 / 1.00
157
+
158
+ ---
159
+
160
+ ## 🧠 The AI Pipeline Deep-Dive
161
+
162
+ ### Step 1: Scenario Injection & Bootstrapping
163
+ ```python
164
+ # The EpisodeManager receives the frontend custom scenario JSON
165
+ # Broadcasts 'episode_start' natively over the WebSocket to synchronize the UI
166
+ await broadcast("episode_start", {
167
+ "scenario": active_scenario,
168
+ "agent_a_model": settings.AGENT_A_MODEL
169
+ })
170
+ ```
171
+
172
+ ### Step 2: Agent Consensus Loop
173
+ ```python
174
+ # Agents interact sequentially. The Investigator attempts a solution
175
+ # while the Validator challenges it. Both agents have access to dynamic system execution.
176
+ client, model_name = model_manager.get_client(agent_id)
177
+ stream = await client.chat.completions.create(
178
+ model=model_name,
179
+ messages=injected_history,
180
+ tools=available_tools, # e.g. fix_proposer, run_terminal_command
181
+ stream=True
182
+ )
183
+ ```
184
+
185
+ ### Step 3: Fast GPU Embeddings (Similarity Evaluation)
186
+ ```python
187
+ # Heavy CPU blocking is completely bypassed.
188
+ # Semantic embedding computations map strictly into the Ollama GPU pipeline.
189
+ @lru_cache(maxsize=256)
190
+ def get_embedding(text: str) -> List[float]:
191
+ response = httpx.post("http://localhost:11434/api/embeddings", json={
192
+ "model": "all-minilm",
193
+ "prompt": text
194
+ }, timeout=60.0)
195
+ return response.json().get("embedding", [])
196
+ ```
197
+
198
+ ---
199
+
200
+ ## 🛠️ Full Technology Stack
201
+
202
+ | Layer | Technology | Why |
203
+ |---|---|---|
204
+ | Frontend Framework | React 18 (Vite) | Lightning fast HMR, component isolation |
205
+ | Frontend Styling | Tailwind CSS | Utility-first tactical glassmorphism |
206
+ | Backend Framework | FastAPI | Async Python, explicit endpoint mapping |
207
+ | Transport Layer | WebSockets | Word-by-word streaming across UI boundaries |
208
+ | Local AI Engine | Ollama | Native device acceleration, absolute privacy |
209
+ | Remote Provider | HuggingFace Inference API | Drop-in SaaS alternatives |
210
+ | SSH Connectivity | Paramiko | Secure remote shell execution for Lab Nodes |
211
+ | Data Persistence | LocalStorage & `.env` Injection | Avoids over-architected SQL constraints |
212
+
213
+ ---
214
+
215
+ ## 🚀 How to Run This Project (Full Step-by-Step Guide)
216
+
217
+ ### 📋 Prerequisites
218
+ - Python 3.10+
219
+ - Node.js 18+
220
+ - [Ollama](https://ollama.com/) (installed locally for model hosting)
221
+ - **Optional**: A remote Linux VM (Ubuntu/Kali) with SSH enabled for Lab Node mode
222
+
223
+ ---
224
+
225
+ ### 1️⃣ Backend Setup (FastAPI / Python)
226
+
227
+ ```bash
228
+ cd backend
229
+
230
+ # Create and activate virtual environment
231
+ python -m venv venv
232
+ # source venv/bin/activate # Linux/macOS
233
+ venv\Scripts\activate # Windows
234
+
235
+ # Install all dependencies
236
+ pip install -r requirements.txt
237
+ ```
238
+
239
+ #### Start the Backend Engine
240
+ ```bash
241
+ # This exposes the core REST API and the WebSocket simulation tunnel
242
+ python main.py
243
+ ```
244
+
245
+ ---
246
+
247
+ ### 2️⃣ Frontend Setup (React)
248
+
249
+ Open a **new terminal tab**:
250
+
251
+ ```bash
252
+ cd frontend
253
+
254
+ # Install Node.js dependencies
255
+ npm install
256
+
257
+ # Start the Vite development server
258
+ npm run dev
259
+ ```
260
+
261
+ The application is now fully accessible at [http://localhost:5173](http://localhost:5173).
262
+
263
+ ---
264
+
265
+ ### 3️⃣ Pulling Models
266
+
267
+ To run the simulation locally without cloud API keys, you must ensure you pull suitable reasoning models through Ollama:
268
+
269
+ ```bash
270
+ ollama run qwen2.5:3b # Excellent validator logic footprint
271
+ ollama run dolphin-llama3 # Uncensored investigative assertions
272
+ ollama pull all-minilm # Mandatory for semantic similarity scoring
273
+ ```
274
+
275
+ ---
276
+
277
+ ## 🧪 Automated Testing
278
+ NEXUS-AI includes a comprehensive test suite to ensure environment stability and specification compliance.
279
+
280
+ ```bash
281
+ # Run the OpenEnv specification validator
282
+ python openenv_validator.py
283
+
284
+ # Run unit tests for core logic
285
+ pip install pytest
286
+ pytest tests/
287
+ ```
288
+
289
+ ---
290
+
291
+ ## 🤝 Authors
292
+ **Developed by: Ashish Menon** & Vector
SYNC_VERIFICATION_0047.txt CHANGED
@@ -1,5 +1,5 @@
1
- This file is a marker to verify that the synchronization between the local environment and the remote repositories (GitHub and Hugging Face) is functioning correctly.
2
-
3
- Timestamp: 2026-04-08 00:47:00
4
- Commit SHA: 4f14584 (previous)
5
- Status: FORCED_SYNC_ACTIVE
 
1
+ This file is a marker to verify that the synchronization between the local environment and the remote repositories (GitHub and Hugging Face) is functioning correctly.
2
+
3
+ Timestamp: 2026-04-08 00:47:00
4
+ Commit SHA: 4f14584 (previous)
5
+ Status: FORCED_SYNC_ACTIVE
backend/config.py CHANGED
@@ -1,75 +1,75 @@
1
- import os
2
- from pathlib import Path
3
- from dotenv import load_dotenv
4
-
5
- BASE_DIR = Path(__file__).resolve().parent
6
- ROOT_DIR = BASE_DIR.parent
7
-
8
- # Load environment variables, checking both backend/ and project root
9
- if (BASE_DIR / ".env").exists():
10
- load_dotenv(BASE_DIR / ".env")
11
- elif (ROOT_DIR / ".env").exists():
12
- load_dotenv(ROOT_DIR / ".env")
13
- elif (ROOT_DIR / "default.env").exists():
14
- load_dotenv(ROOT_DIR / "default.env")
15
- else:
16
- load_dotenv() # Fallback to standard search
17
-
18
- class Settings:
19
- # OLLAMA
20
- OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434/v1")
21
- OLLAMA_API_KEY = os.getenv("OLLAMA_API_KEY", "ollama")
22
-
23
- # AGENTS
24
- AGENT_A_MODEL = os.getenv("AGENT_A_MODEL", "")
25
- AGENT_B_MODEL = os.getenv("AGENT_B_MODEL", "")
26
- AGENT_A_PROVIDER = os.getenv("AGENT_A_PROVIDER", "ollama")
27
- AGENT_B_PROVIDER = os.getenv("AGENT_B_PROVIDER", "ollama")
28
- AGENT_A_ROLE = os.getenv("AGENT_A_ROLE", "INVESTIGATOR")
29
- AGENT_B_ROLE = os.getenv("AGENT_B_ROLE", "VALIDATOR")
30
- AGENT_A_SYSTEM_PROMPT = os.getenv("AGENT_A_SYSTEM_PROMPT", "")
31
- AGENT_B_SYSTEM_PROMPT = os.getenv("AGENT_B_SYSTEM_PROMPT", "")
32
- AGENT_A_TEMPERATURE = float(os.getenv("AGENT_A_TEMPERATURE", "0.8"))
33
- AGENT_B_TEMPERATURE = float(os.getenv("AGENT_B_TEMPERATURE", "0.6"))
34
- AGENT_A_MAX_TOKENS = int(os.getenv("AGENT_A_MAX_TOKENS", "300"))
35
- AGENT_B_MAX_TOKENS = int(os.getenv("AGENT_B_MAX_TOKENS", "300"))
36
- # EXECUTION ENVIRONMENT
37
- EXECUTION_MODE = os.getenv("EXECUTION_MODE", "simulated")
38
- SSH_HOST = os.getenv("SSH_HOST", "")
39
- SSH_PORT = int(os.getenv("SSH_PORT", "22"))
40
- SSH_USER = os.getenv("SSH_USER", "")
41
- SSH_PASSWORD = os.getenv("SSH_PASSWORD", "")
42
-
43
- # HUGGINGFACE
44
- API_KEY = os.getenv("API_KEY", "ollama")
45
- OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
46
- HF_TOKEN = os.getenv("HF_TOKEN", "")
47
- HF_INFERENCE_URL = os.getenv("HF_INFERENCE_URL", "https://router.huggingface.co/v1")
48
-
49
- # OPENROUTER
50
- OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY", "")
51
- OPENROUTER_BASE_URL = os.getenv("OPENROUTER_BASE_URL", "https://openrouter.ai/api/v1")
52
-
53
- # SERVER
54
- HOST = os.getenv("HOST", "0.0.0.0")
55
- PORT = int(os.getenv("PORT", "7860"))
56
- DEBUG = os.getenv("DEBUG", "true").lower() in ("true", "1", "yes")
57
- ENVIRONMENT = os.getenv("ENVIRONMENT", "local")
58
-
59
- # EPISODE
60
- MAX_STEPS = int(os.getenv("MAX_STEPS", "1000"))
61
- MAX_EPISODE_TIME_SECONDS = int(os.getenv("MAX_EPISODE_TIME_SECONDS", "1200"))
62
- SUCCESS_SCORE_THRESHOLD = float(os.getenv("SUCCESS_SCORE_THRESHOLD", "0.5"))
63
-
64
- # MCP TOOL SERVER
65
- MCP_SERVER_PORT = int(os.getenv("MCP_SERVER_PORT", "8001"))
66
- MCP_SERVER_URL = os.getenv("MCP_SERVER_URL", "http://localhost:8001")
67
-
68
- # CUSTOM MODEL
69
- CUSTOM_MODEL_ENABLED = os.getenv("CUSTOM_MODEL_ENABLED", "false").lower() in ("true", "1", "yes")
70
- CUSTOM_MODEL_BASE_URL = os.getenv("CUSTOM_MODEL_BASE_URL", "")
71
- CUSTOM_MODEL_API_KEY = os.getenv("CUSTOM_MODEL_API_KEY", "")
72
- CUSTOM_MODEL_NAME = os.getenv("CUSTOM_MODEL_NAME", "")
73
- CUSTOM_MODEL_AGENT = os.getenv("CUSTOM_MODEL_AGENT", "")
74
-
75
- settings = Settings()
 
1
+ import os
2
+ from pathlib import Path
3
+ from dotenv import load_dotenv
4
+
5
+ BASE_DIR = Path(__file__).resolve().parent
6
+ ROOT_DIR = BASE_DIR.parent
7
+
8
+ # Load environment variables, checking both backend/ and project root
9
+ if (BASE_DIR / ".env").exists():
10
+ load_dotenv(BASE_DIR / ".env")
11
+ elif (ROOT_DIR / ".env").exists():
12
+ load_dotenv(ROOT_DIR / ".env")
13
+ elif (ROOT_DIR / "default.env").exists():
14
+ load_dotenv(ROOT_DIR / "default.env")
15
+ else:
16
+ load_dotenv() # Fallback to standard search
17
+
18
+ class Settings:
19
+ # OLLAMA
20
+ OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434/v1")
21
+ OLLAMA_API_KEY = os.getenv("OLLAMA_API_KEY", "ollama")
22
+
23
+ # AGENTS
24
+ AGENT_A_MODEL = os.getenv("AGENT_A_MODEL", "")
25
+ AGENT_B_MODEL = os.getenv("AGENT_B_MODEL", "")
26
+ AGENT_A_PROVIDER = os.getenv("AGENT_A_PROVIDER", "ollama")
27
+ AGENT_B_PROVIDER = os.getenv("AGENT_B_PROVIDER", "ollama")
28
+ AGENT_A_ROLE = os.getenv("AGENT_A_ROLE", "INVESTIGATOR")
29
+ AGENT_B_ROLE = os.getenv("AGENT_B_ROLE", "VALIDATOR")
30
+ AGENT_A_SYSTEM_PROMPT = os.getenv("AGENT_A_SYSTEM_PROMPT", "")
31
+ AGENT_B_SYSTEM_PROMPT = os.getenv("AGENT_B_SYSTEM_PROMPT", "")
32
+ AGENT_A_TEMPERATURE = float(os.getenv("AGENT_A_TEMPERATURE", "0.8"))
33
+ AGENT_B_TEMPERATURE = float(os.getenv("AGENT_B_TEMPERATURE", "0.6"))
34
+ AGENT_A_MAX_TOKENS = int(os.getenv("AGENT_A_MAX_TOKENS", "300"))
35
+ AGENT_B_MAX_TOKENS = int(os.getenv("AGENT_B_MAX_TOKENS", "300"))
36
+ # EXECUTION ENVIRONMENT
37
+ EXECUTION_MODE = os.getenv("EXECUTION_MODE", "simulated")
38
+ SSH_HOST = os.getenv("SSH_HOST", "")
39
+ SSH_PORT = int(os.getenv("SSH_PORT", "22"))
40
+ SSH_USER = os.getenv("SSH_USER", "")
41
+ SSH_PASSWORD = os.getenv("SSH_PASSWORD", "")
42
+
43
+ # HUGGINGFACE
44
+ API_KEY = os.getenv("API_KEY", "ollama")
45
+ OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
46
+ HF_TOKEN = os.getenv("HF_TOKEN", "")
47
+ HF_INFERENCE_URL = os.getenv("HF_INFERENCE_URL", "https://router.huggingface.co/v1")
48
+
49
+ # OPENROUTER
50
+ OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY", "")
51
+ OPENROUTER_BASE_URL = os.getenv("OPENROUTER_BASE_URL", "https://openrouter.ai/api/v1")
52
+
53
+ # SERVER
54
+ HOST = os.getenv("HOST", "0.0.0.0")
55
+ PORT = int(os.getenv("PORT", "7860"))
56
+ DEBUG = os.getenv("DEBUG", "true").lower() in ("true", "1", "yes")
57
+ ENVIRONMENT = os.getenv("ENVIRONMENT", "local")
58
+
59
+ # EPISODE
60
+ MAX_STEPS = int(os.getenv("MAX_STEPS", "1000"))
61
+ MAX_EPISODE_TIME_SECONDS = int(os.getenv("MAX_EPISODE_TIME_SECONDS", "1200"))
62
+ SUCCESS_SCORE_THRESHOLD = float(os.getenv("SUCCESS_SCORE_THRESHOLD", "0.5"))
63
+
64
+ # MCP TOOL SERVER
65
+ MCP_SERVER_PORT = int(os.getenv("MCP_SERVER_PORT", "8001"))
66
+ MCP_SERVER_URL = os.getenv("MCP_SERVER_URL", "http://localhost:8001")
67
+
68
+ # CUSTOM MODEL
69
+ CUSTOM_MODEL_ENABLED = os.getenv("CUSTOM_MODEL_ENABLED", "false").lower() in ("true", "1", "yes")
70
+ CUSTOM_MODEL_BASE_URL = os.getenv("CUSTOM_MODEL_BASE_URL", "")
71
+ CUSTOM_MODEL_API_KEY = os.getenv("CUSTOM_MODEL_API_KEY", "")
72
+ CUSTOM_MODEL_NAME = os.getenv("CUSTOM_MODEL_NAME", "")
73
+ CUSTOM_MODEL_AGENT = os.getenv("CUSTOM_MODEL_AGENT", "")
74
+
75
+ settings = Settings()
backend/core/environment.py CHANGED
@@ -1,160 +1,160 @@
1
- import json
2
- from typing import Tuple, Dict
3
-
4
- from scenarios.scenario_loader import scenario_loader
5
- from core.state_manager import EpisodeState
6
- from core.reward_engine import compute_reward
7
- from core.agent_runner import AgentRunner
8
- from scenarios.graders.easy_grader import EasyGrader
9
- from scenarios.graders.medium_grader import MediumGrader
10
- from scenarios.graders.hard_grader import HardGrader
11
- from api.schemas.action import NexusAction
12
- from api.schemas.observation import NexusObservation, ToolResult
13
- from config import settings
14
- import statistics
15
-
16
- SIMULATED_TOOLS = ["read_logs", "check_config", "query_database", "check_service_status", "run_diagnostic", "update_config", "restart_service", "propose_fix", "verify_fix", "submit_resolution"]
17
- SSH_TOOLS = ["run_terminal_command", "propose_fix", "verify_fix", "submit_resolution"]
18
-
19
- class NexusEnvironment:
20
- def __init__(self):
21
- self.runner = AgentRunner()
22
- self.active_episode = None
23
- self.active_scenario = None
24
-
25
- self.graders = {
26
- "easy": EasyGrader(),
27
- "medium": MediumGrader(),
28
- "hard": HardGrader()
29
- }
30
-
31
- async def reset(self, task: str = "software-incident", scenario_id: str = None, custom_scenario: dict = None, seed: int = None, max_steps: int = None) -> NexusObservation:
32
- # Determine difficulty from task
33
- valid_tasks = ["software-incident", "business-process-failure", "cascade-system-failure"]
34
- if task not in valid_tasks and not custom_scenario and not scenario_id:
35
- raise ValueError(f"Invalid task name: {task}")
36
-
37
- difficulty = "easy"
38
- if task == "business-process-failure":
39
- difficulty = "medium"
40
- elif task == "cascade-system-failure":
41
- difficulty = "hard"
42
-
43
- if custom_scenario:
44
- scenario = custom_scenario
45
- scenario["id"] = scenario.get("id", "custom-1")
46
- scenario["description"] = scenario.get("description", "Custom imported scenario.")
47
- scenario["context"] = scenario.get("context", "Custom uploaded environment.")
48
- if "difficulty" in scenario:
49
- difficulty = scenario["difficulty"].lower()
50
- elif scenario_id:
51
- scenario = scenario_loader.get_scenario(scenario_id)
52
- else:
53
- scenarios = scenario_loader.get_scenarios_by_difficulty(difficulty)
54
- if not scenarios:
55
- raise ValueError(f"No scenarios found for difficulty {difficulty}")
56
- import random
57
- if seed is not None:
58
- random.seed(seed)
59
- scenario = random.choice(scenarios)
60
-
61
- self.active_scenario = scenario
62
- self.active_episode = EpisodeState(
63
- scenario_id=scenario["id"],
64
- task=task,
65
- difficulty=difficulty,
66
- max_rounds=max_steps if max_steps is not None else settings.MAX_STEPS,
67
- scenario_data=scenario
68
- )
69
-
70
- available_tools = SSH_TOOLS if settings.EXECUTION_MODE == "ssh" else SIMULATED_TOOLS
71
- obs = NexusObservation(
72
- partner_message="",
73
- tool_results=[],
74
- system_state={},
75
- investigation_stage="investigating",
76
- round=1,
77
- available_tools=available_tools,
78
- clues_found=[],
79
- scenario_description=scenario["description"],
80
- scenario_context=scenario["context"]
81
- )
82
- return obs
83
-
84
- async def step(self, action: NexusAction) -> Tuple[NexusObservation, float, bool, dict]:
85
- if not self.active_episode:
86
- raise ValueError("Environment must be reset before calling step")
87
-
88
- ep = self.active_episode
89
- sc = self.active_scenario
90
-
91
- # 1. Add agent message to state
92
- ep.add_message(action.agent_id, action.message)
93
-
94
- # 2. Execute tools
95
- tool_results_data = await self.runner.execute_tool_calls(action.tool_calls, sc, ep.current_round, ep)
96
-
97
- # Process tool clues
98
- tool_results_objs = []
99
- for tr in tool_results_data:
100
- if "status: degraded" in tr['result'].lower() or "error" in tr['result'].lower() or "anomaly" in tr['result'].lower() or "warning" in tr['result'].lower() or tr['tool_name'] == 'propose_fix' or tr['tool_name'] == 'verify_fix':
101
- ep.add_clue(tr['result'])
102
- tool_results_objs.append(ToolResult(**tr))
103
-
104
- # 3. Compute semantic reward dynamically
105
- reward, breakdown = compute_reward(action.message, action.tool_calls, tool_results_data, ep, sc)
106
-
107
- # Stop when resolution submitted or max steps taken
108
- if ep.fix_verified or ep.steps_taken >= ep.max_rounds:
109
- ep.done = True
110
-
111
- # If they maxed out without resolving, inject a synthetic report so the UI doesn't look broken
112
- if not ep.fix_verified:
113
- ep.add_tool_call("submit_resolution", {
114
- "root_cause_service": "UNRESOLVED",
115
- "root_cause_description": "Investigation terminated: Maximum round limit reached without agent consensus.",
116
- "fix_applied": "No fix was submitted."
117
- })
118
-
119
- # Hybrid Final Scorer: Combine objective grader results with semantic reward history
120
- grader = self.graders.get(ep.difficulty, self.graders["easy"])
121
- grader_score = grader.grade(ep, sc)
122
-
123
- # Use average step reward as the semantic component (0.0 - 1.0)
124
- avg_semantic = statistics.mean(ep.reward_history) if ep.reward_history else 0.0
125
-
126
- # Weighted average: Grader (Objective) 60% + Semantic (Quality) 40%
127
- # If the grader score is 1.0 (perfect fix), we lean more into the objective truth.
128
- if grader_score >= 0.90:
129
- final_score = grader_score * 0.8 + avg_semantic * 0.2
130
- else:
131
- final_score = grader_score * 0.6 + avg_semantic * 0.4
132
-
133
- final_score = round(max(0.0, min(1.0, final_score)), 4)
134
-
135
- info = {
136
- "breakdown": {**breakdown, "semantic_avg": round(avg_semantic, 4), "objective_score": grader_score},
137
- "final_score": final_score,
138
- "success": (final_score >= settings.SUCCESS_SCORE_THRESHOLD) or (ep.fix_verified and grader_score > 0)
139
- }
140
- else:
141
- info = {"breakdown": breakdown}
142
-
143
- obs = NexusObservation(
144
- partner_message=action.message,
145
- tool_results=tool_results_objs,
146
- system_state={"total_tools_run": len(ep.tool_calls_made)},
147
- investigation_stage=ep.investigation_stage,
148
- round=ep.current_round,
149
- available_tools=SSH_TOOLS if settings.EXECUTION_MODE == "ssh" else SIMULATED_TOOLS,
150
- clues_found=ep.clues_found,
151
- scenario_description=sc["description"],
152
- scenario_context=sc["context"]
153
- )
154
-
155
- return obs, reward, ep.done, info
156
-
157
- def state(self):
158
- if not self.active_episode:
159
- return None
160
- return self.active_episode.to_pydantic()
 
1
+ import json
2
+ from typing import Tuple, Dict
3
+
4
+ from scenarios.scenario_loader import scenario_loader
5
+ from core.state_manager import EpisodeState
6
+ from core.reward_engine import compute_reward
7
+ from core.agent_runner import AgentRunner
8
+ from scenarios.graders.easy_grader import EasyGrader
9
+ from scenarios.graders.medium_grader import MediumGrader
10
+ from scenarios.graders.hard_grader import HardGrader
11
+ from api.schemas.action import NexusAction
12
+ from api.schemas.observation import NexusObservation, ToolResult
13
+ from config import settings
14
+ import statistics
15
+
16
+ SIMULATED_TOOLS = ["read_logs", "check_config", "query_database", "check_service_status", "run_diagnostic", "update_config", "restart_service", "propose_fix", "verify_fix", "submit_resolution"]
17
+ SSH_TOOLS = ["run_terminal_command", "propose_fix", "verify_fix", "submit_resolution"]
18
+
19
+ class NexusEnvironment:
20
+ def __init__(self):
21
+ self.runner = AgentRunner()
22
+ self.active_episode = None
23
+ self.active_scenario = None
24
+
25
+ self.graders = {
26
+ "easy": EasyGrader(),
27
+ "medium": MediumGrader(),
28
+ "hard": HardGrader()
29
+ }
30
+
31
+ async def reset(self, task: str = "software-incident", scenario_id: str = None, custom_scenario: dict = None, seed: int = None, max_steps: int = None) -> NexusObservation:
32
+ # Determine difficulty from task
33
+ valid_tasks = ["software-incident", "business-process-failure", "cascade-system-failure"]
34
+ if task not in valid_tasks and not custom_scenario and not scenario_id:
35
+ raise ValueError(f"Invalid task name: {task}")
36
+
37
+ difficulty = "easy"
38
+ if task == "business-process-failure":
39
+ difficulty = "medium"
40
+ elif task == "cascade-system-failure":
41
+ difficulty = "hard"
42
+
43
+ if custom_scenario:
44
+ scenario = custom_scenario
45
+ scenario["id"] = scenario.get("id", "custom-1")
46
+ scenario["description"] = scenario.get("description", "Custom imported scenario.")
47
+ scenario["context"] = scenario.get("context", "Custom uploaded environment.")
48
+ if "difficulty" in scenario:
49
+ difficulty = scenario["difficulty"].lower()
50
+ elif scenario_id:
51
+ scenario = scenario_loader.get_scenario(scenario_id)
52
+ else:
53
+ scenarios = scenario_loader.get_scenarios_by_difficulty(difficulty)
54
+ if not scenarios:
55
+ raise ValueError(f"No scenarios found for difficulty {difficulty}")
56
+ import random
57
+ if seed is not None:
58
+ random.seed(seed)
59
+ scenario = random.choice(scenarios)
60
+
61
+ self.active_scenario = scenario
62
+ self.active_episode = EpisodeState(
63
+ scenario_id=scenario["id"],
64
+ task=task,
65
+ difficulty=difficulty,
66
+ max_rounds=max_steps if max_steps is not None else settings.MAX_STEPS,
67
+ scenario_data=scenario
68
+ )
69
+
70
+ available_tools = SSH_TOOLS if settings.EXECUTION_MODE == "ssh" else SIMULATED_TOOLS
71
+ obs = NexusObservation(
72
+ partner_message="",
73
+ tool_results=[],
74
+ system_state={},
75
+ investigation_stage="investigating",
76
+ round=1,
77
+ available_tools=available_tools,
78
+ clues_found=[],
79
+ scenario_description=scenario["description"],
80
+ scenario_context=scenario["context"]
81
+ )
82
+ return obs
83
+
84
+ async def step(self, action: NexusAction) -> Tuple[NexusObservation, float, bool, dict]:
85
+ if not self.active_episode:
86
+ raise ValueError("Environment must be reset before calling step")
87
+
88
+ ep = self.active_episode
89
+ sc = self.active_scenario
90
+
91
+ # 1. Add agent message to state
92
+ ep.add_message(action.agent_id, action.message)
93
+
94
+ # 2. Execute tools
95
+ tool_results_data = await self.runner.execute_tool_calls(action.tool_calls, sc, ep.current_round, ep)
96
+
97
+ # Process tool clues
98
+ tool_results_objs = []
99
+ for tr in tool_results_data:
100
+ if "status: degraded" in tr['result'].lower() or "error" in tr['result'].lower() or "anomaly" in tr['result'].lower() or "warning" in tr['result'].lower() or tr['tool_name'] == 'propose_fix' or tr['tool_name'] == 'verify_fix':
101
+ ep.add_clue(tr['result'])
102
+ tool_results_objs.append(ToolResult(**tr))
103
+
104
+ # 3. Compute semantic reward dynamically
105
+ reward, breakdown = compute_reward(action.message, action.tool_calls, tool_results_data, ep, sc)
106
+
107
+ # Stop when resolution submitted or max steps taken
108
+ if ep.fix_verified or ep.steps_taken >= ep.max_rounds:
109
+ ep.done = True
110
+
111
+ # If they maxed out without resolving, inject a synthetic report so the UI doesn't look broken
112
+ if not ep.fix_verified:
113
+ ep.add_tool_call("submit_resolution", {
114
+ "root_cause_service": "UNRESOLVED",
115
+ "root_cause_description": "Investigation terminated: Maximum round limit reached without agent consensus.",
116
+ "fix_applied": "No fix was submitted."
117
+ })
118
+
119
+ # Hybrid Final Scorer: Combine objective grader results with semantic reward history
120
+ grader = self.graders.get(ep.difficulty, self.graders["easy"])
121
+ grader_score = grader.grade(ep, sc)
122
+
123
+ # Use average step reward as the semantic component (0.0 - 1.0)
124
+ avg_semantic = statistics.mean(ep.reward_history) if ep.reward_history else 0.0
125
+
126
+ # Weighted average: Grader (Objective) 60% + Semantic (Quality) 40%
127
+ # If the grader score is 1.0 (perfect fix), we lean more into the objective truth.
128
+ if grader_score >= 0.90:
129
+ final_score = grader_score * 0.8 + avg_semantic * 0.2
130
+ else:
131
+ final_score = grader_score * 0.6 + avg_semantic * 0.4
132
+
133
+ final_score = round(max(0.0, min(1.0, final_score)), 4)
134
+
135
+ info = {
136
+ "breakdown": {**breakdown, "semantic_avg": round(avg_semantic, 4), "objective_score": grader_score},
137
+ "final_score": final_score,
138
+ "success": (final_score >= settings.SUCCESS_SCORE_THRESHOLD) or (ep.fix_verified and grader_score > 0)
139
+ }
140
+ else:
141
+ info = {"breakdown": breakdown}
142
+
143
+ obs = NexusObservation(
144
+ partner_message=action.message,
145
+ tool_results=tool_results_objs,
146
+ system_state={"total_tools_run": len(ep.tool_calls_made)},
147
+ investigation_stage=ep.investigation_stage,
148
+ round=ep.current_round,
149
+ available_tools=SSH_TOOLS if settings.EXECUTION_MODE == "ssh" else SIMULATED_TOOLS,
150
+ clues_found=ep.clues_found,
151
+ scenario_description=sc["description"],
152
+ scenario_context=sc["context"]
153
+ )
154
+
155
+ return obs, reward, ep.done, info
156
+
157
+ def state(self):
158
+ if not self.active_episode:
159
+ return None
160
+ return self.active_episode.to_pydantic()
backend/core/episode_manager.py CHANGED
@@ -1,95 +1,95 @@
1
- import asyncio
2
- from core.environment import NexusEnvironment
3
- from api.routes.websocket import broadcast
4
-
5
- class EpisodeManager:
6
- """Manages active episodes and coordinates the WebSocket emissions."""
7
- def __init__(self):
8
- self.env = NexusEnvironment()
9
- self.is_paused = False
10
- self.simulation_task = None
11
-
12
- async def reset(self, task: str, custom_scenario: dict = None, seed: int = None, max_steps: int = None, broadcast_episode: bool = True):
13
- # Cancel any active simulation loop
14
- if hasattr(self, 'simulation_task') and self.simulation_task and not self.simulation_task.done():
15
- self.simulation_task.cancel()
16
- try:
17
- await self.simulation_task
18
- except asyncio.CancelledError:
19
- pass
20
- self.simulation_task = None
21
-
22
- obs = await self.env.reset(task=task, custom_scenario=custom_scenario, seed=seed, max_steps=max_steps)
23
-
24
- if broadcast_episode:
25
- # Broadcast episode_start
26
- sc_safe = self.env.active_scenario.copy()
27
- if "root_cause" in sc_safe: del sc_safe["root_cause"]
28
- if "correct_fix" in sc_safe: del sc_safe["correct_fix"]
29
- if "clue_map" in sc_safe: del sc_safe["clue_map"]
30
-
31
- from config import settings
32
- await broadcast("episode_start", {
33
- "episode_id": self.env.active_episode.episode_id,
34
- "scenario": sc_safe,
35
- "task": task,
36
- "difficulty": self.env.active_episode.difficulty,
37
- "agent_a_model": settings.AGENT_A_MODEL,
38
- "agent_b_model": settings.AGENT_B_MODEL
39
- })
40
  return obs
41
-
42
- async def step(self, action):
43
- obs, reward, done, info = await self.env.step(action)
44
-
45
- # Broadcast agent message
46
- await broadcast("agent_message", {
47
- "agent_id": action.agent_id,
48
- "message": action.message,
49
- "step": self.env.active_episode.steps_taken
50
- })
51
-
52
- # Broadcast tool calls
53
- for tc in action.tool_calls:
54
- await broadcast("tool_call", {
55
- "agent_id": action.agent_id,
56
- "tool_name": tc.tool_name,
57
- "params": tc.params,
58
- "step": self.env.active_episode.steps_taken
59
- })
60
-
61
- # Broadcast tool results
62
- for tr in obs.tool_results:
63
- await broadcast("tool_result", {
64
- "tool_name": tr.tool_name,
65
- "result": tr.result,
66
- "success": tr.success,
67
- "step": self.env.active_episode.steps_taken
68
- })
69
-
70
- # Broadcast reward
71
- await broadcast("reward_update", {
72
- "agent_id": action.agent_id,
73
- "reward": reward,
74
- "breakdown": info.get("breakdown", {}),
75
- "cumulative": self.env.active_episode.cumulative_reward,
76
- "step": self.env.active_episode.steps_taken
77
- })
78
-
79
- if done:
80
- await broadcast("episode_end", {
81
- "episode_id": self.env.active_episode.episode_id,
82
- "success": info.get("success", False),
83
- "steps_taken": self.env.active_episode.steps_taken,
84
- "final_score": info.get("final_score", getattr(self.env.active_episode, "cumulative_reward", 0)),
85
- "final_breakdown": info.get("breakdown", {}),
86
- "clues_found": self.env.active_episode.clues_found,
87
- "root_cause_found": self.env.active_episode.fix_correct,
88
- "fix_verified": self.env.active_episode.fix_verified,
89
- "time_taken_seconds": 0,
90
- "reward_history": self.env.active_episode.reward_history
91
- })
92
-
93
- return obs, reward, done, info
94
-
95
- episode_manager = EpisodeManager()
 
1
+ import asyncio
2
+ from core.environment import NexusEnvironment
3
+ from api.routes.websocket import broadcast
4
+
5
+ class EpisodeManager:
6
+ """Manages active episodes and coordinates the WebSocket emissions."""
7
+ def __init__(self):
8
+ self.env = NexusEnvironment()
9
+ self.is_paused = False
10
+ self.simulation_task = None
11
+
12
+ async def reset(self, task: str, custom_scenario: dict = None, seed: int = None, max_steps: int = None, broadcast_episode: bool = True):
13
+ # Cancel any active simulation loop
14
+ if hasattr(self, 'simulation_task') and self.simulation_task and not self.simulation_task.done():
15
+ self.simulation_task.cancel()
16
+ try:
17
+ await self.simulation_task
18
+ except asyncio.CancelledError:
19
+ pass
20
+ self.simulation_task = None
21
+
22
+ obs = await self.env.reset(task=task, custom_scenario=custom_scenario, seed=seed, max_steps=max_steps)
23
+
24
+ if broadcast_episode:
25
+ # Broadcast episode_start
26
+ sc_safe = self.env.active_scenario.copy()
27
+ if "root_cause" in sc_safe: del sc_safe["root_cause"]
28
+ if "correct_fix" in sc_safe: del sc_safe["correct_fix"]
29
+ if "clue_map" in sc_safe: del sc_safe["clue_map"]
30
+
31
+ from config import settings
32
+ await broadcast("episode_start", {
33
+ "episode_id": self.env.active_episode.episode_id,
34
+ "scenario": sc_safe,
35
+ "task": task,
36
+ "difficulty": self.env.active_episode.difficulty,
37
+ "agent_a_model": settings.AGENT_A_MODEL,
38
+ "agent_b_model": settings.AGENT_B_MODEL
39
+ })
40
  return obs
41
+
42
+ async def step(self, action):
43
+ obs, reward, done, info = await self.env.step(action)
44
+
45
+ # Broadcast agent message
46
+ await broadcast("agent_message", {
47
+ "agent_id": action.agent_id,
48
+ "message": action.message,
49
+ "step": self.env.active_episode.steps_taken
50
+ })
51
+
52
+ # Broadcast tool calls
53
+ for tc in action.tool_calls:
54
+ await broadcast("tool_call", {
55
+ "agent_id": action.agent_id,
56
+ "tool_name": tc.tool_name,
57
+ "params": tc.params,
58
+ "step": self.env.active_episode.steps_taken
59
+ })
60
+
61
+ # Broadcast tool results
62
+ for tr in obs.tool_results:
63
+ await broadcast("tool_result", {
64
+ "tool_name": tr.tool_name,
65
+ "result": tr.result,
66
+ "success": tr.success,
67
+ "step": self.env.active_episode.steps_taken
68
+ })
69
+
70
+ # Broadcast reward
71
+ await broadcast("reward_update", {
72
+ "agent_id": action.agent_id,
73
+ "reward": reward,
74
+ "breakdown": info.get("breakdown", {}),
75
+ "cumulative": self.env.active_episode.cumulative_reward,
76
+ "step": self.env.active_episode.steps_taken
77
+ })
78
+
79
+ if done:
80
+ await broadcast("episode_end", {
81
+ "episode_id": self.env.active_episode.episode_id,
82
+ "success": info.get("success", False),
83
+ "steps_taken": self.env.active_episode.steps_taken,
84
+ "final_score": info.get("final_score", getattr(self.env.active_episode, "cumulative_reward", 0)),
85
+ "final_breakdown": info.get("breakdown", {}),
86
+ "clues_found": self.env.active_episode.clues_found,
87
+ "root_cause_found": self.env.active_episode.fix_correct,
88
+ "fix_verified": self.env.active_episode.fix_verified,
89
+ "time_taken_seconds": 0,
90
+ "reward_history": self.env.active_episode.reward_history
91
+ })
92
+
93
+ return obs, reward, done, info
94
+
95
+ episode_manager = EpisodeManager()
backend/requirements.txt CHANGED
@@ -1,14 +1,14 @@
1
- fastapi>=0.110.0
2
- uvicorn[standard]>=0.27.0
3
- openai>=1.12.0
4
- pydantic>=2.6.0
5
- pydantic-settings>=2.2.0
6
- python-dotenv>=1.0.0
7
- websockets>=12.0
8
- httpx>=0.27.0
9
- numpy>=1.26.0
10
- numpy>=1.26.0
11
- aiofiles>=23.2.1
12
- python-multipart>=0.0.9
13
- paramiko>=3.4.0
14
- psutil>=5.9.0
 
1
+ fastapi>=0.110.0
2
+ uvicorn[standard]>=0.27.0
3
+ openai>=1.12.0
4
+ pydantic>=2.6.0
5
+ pydantic-settings>=2.2.0
6
+ python-dotenv>=1.0.0
7
+ websockets>=12.0
8
+ httpx>=0.27.0
9
+ numpy>=1.26.0
10
+ aiofiles>=23.2.1
11
+ python-multipart>=0.0.9
12
+ paramiko>=3.4.0
13
+ psutil>=5.9.0
14
+ openenv-core>=0.2.0
backend/scenarios/data/easy/software-incident.json CHANGED
@@ -1,33 +1,33 @@
1
- {
2
- "id": "software-incident",
3
- "title": "Nginx Rate Limit Investigation",
4
- "difficulty": "easy",
5
- "domain": "DevOps",
6
- "description": "Users are reporting 503 errors when accessing the main API. Initial reports suggest a misconfigured rate limit.",
7
- "context": "The system uses Nginx as a reverse proxy. A recent change might have throttled legitimate traffic.",
8
- "symptoms": [
9
- "HTTP 503 errors",
10
- "High latency for API calls"
11
- ],
12
- "available_services": [
13
- "nginx-proxy",
14
- "api-gateway"
15
- ],
16
- "initial_state": {
17
- "nginx-proxy": {
18
- "status": "running",
19
- "rate_limit": "10",
20
- "last_reload": "2 hours ago"
21
- }
22
- },
23
- "root_cause": {
24
- "service": "nginx-proxy",
25
- "description": "Nginx rate limit was set too low (10 requests/sec) during a maintenance window."
26
- },
27
- "grading_criteria": {
28
- "nginx_rate_limit_fixed": 0.50,
29
- "nginx_restarted": 0.20,
30
- "fix_verified": 0.20,
31
- "efficiency_bonus": 0.10
32
- }
33
  }
 
1
+ {
2
+ "id": "software-incident",
3
+ "title": "Nginx Rate Limit Investigation",
4
+ "difficulty": "easy",
5
+ "domain": "DevOps",
6
+ "description": "Users are reporting 503 errors when accessing the main API. Initial reports suggest a misconfigured rate limit.",
7
+ "context": "The system uses Nginx as a reverse proxy. A recent change might have throttled legitimate traffic.",
8
+ "symptoms": [
9
+ "HTTP 503 errors",
10
+ "High latency for API calls"
11
+ ],
12
+ "available_services": [
13
+ "nginx-proxy",
14
+ "api-gateway"
15
+ ],
16
+ "initial_state": {
17
+ "nginx-proxy": {
18
+ "status": "running",
19
+ "rate_limit": "10",
20
+ "last_reload": "2 hours ago"
21
+ }
22
+ },
23
+ "root_cause": {
24
+ "service": "nginx-proxy",
25
+ "description": "Nginx rate limit was set too low (10 requests/sec) during a maintenance window."
26
+ },
27
+ "grading_criteria": {
28
+ "nginx_rate_limit_fixed": 0.50,
29
+ "nginx_restarted": 0.20,
30
+ "fix_verified": 0.20,
31
+ "efficiency_bonus": 0.10
32
+ }
33
  }
backend/scenarios/data/hard/cascade-system-failure.json CHANGED
@@ -1,42 +1,42 @@
1
- {
2
- "id": "cascade-system-failure",
3
- "title": "Postgres Connection Exhaustion",
4
- "difficulty": "hard",
5
- "domain": "Database",
6
- "description": "A cascade failure is occurring across the cluster. Database connections are being exhausted by a long-running analytics query.",
7
- "context": "The analytics service might be the culprit. A red herring points to the disk backup agent.",
8
- "symptoms": [
9
- "FATAL: too many connections",
10
- "Application timeout",
11
- "High I/O wait"
12
- ],
13
- "available_services": [
14
- "postgres-db",
15
- "disk-backup-agent",
16
- "analytics-service"
17
- ],
18
- "initial_state": {
19
- "postgres-db": {
20
- "status": "running",
21
- "max_connections": "20",
22
- "long_running_query": "SELECT * FROM large_audit_table CROSS JOIN high_res_metrics",
23
- "query_timeout_analytics": "0"
24
- },
25
- "disk-backup-agent": {
26
- "status": "degraded",
27
- "disk_scan_active": "true"
28
- }
29
- },
30
- "root_cause": {
31
- "service": "postgres-db",
32
- "description": "A cross-join query in the analytics service is locking connections, coupled with a low max_connections limit."
33
- },
34
- "grading_criteria": {
35
- "postgres_query_terminated": 0.25,
36
- "postgres_max_connections_increased": 0.20,
37
- "postgres_query_timeout_set": 0.20,
38
- "penalty_disk_backup_agent_modified": -0.15,
39
- "fix_verified": 0.10,
40
- "efficiency_bonus": 0.05
41
- }
42
  }
 
1
+ {
2
+ "id": "cascade-system-failure",
3
+ "title": "Postgres Connection Exhaustion",
4
+ "difficulty": "hard",
5
+ "domain": "Database",
6
+ "description": "A cascade failure is occurring across the cluster. Database connections are being exhausted by a long-running analytics query.",
7
+ "context": "The analytics service might be the culprit. A red herring points to the disk backup agent.",
8
+ "symptoms": [
9
+ "FATAL: too many connections",
10
+ "Application timeout",
11
+ "High I/O wait"
12
+ ],
13
+ "available_services": [
14
+ "postgres-db",
15
+ "disk-backup-agent",
16
+ "analytics-service"
17
+ ],
18
+ "initial_state": {
19
+ "postgres-db": {
20
+ "status": "running",
21
+ "max_connections": "20",
22
+ "long_running_query": "SELECT * FROM large_audit_table CROSS JOIN high_res_metrics",
23
+ "query_timeout_analytics": "0"
24
+ },
25
+ "disk-backup-agent": {
26
+ "status": "degraded",
27
+ "disk_scan_active": "true"
28
+ }
29
+ },
30
+ "root_cause": {
31
+ "service": "postgres-db",
32
+ "description": "A cross-join query in the analytics service is locking connections, coupled with a low max_connections limit."
33
+ },
34
+ "grading_criteria": {
35
+ "postgres_query_terminated": 0.25,
36
+ "postgres_max_connections_increased": 0.20,
37
+ "postgres_query_timeout_set": 0.20,
38
+ "penalty_disk_backup_agent_modified": -0.15,
39
+ "fix_verified": 0.10,
40
+ "efficiency_bonus": 0.05
41
+ }
42
  }
backend/scenarios/data/medium/business-process-failure.json CHANGED
@@ -1,39 +1,39 @@
1
- {
2
- "id": "business-process-failure",
3
- "title": "Inventory Stockout Loop",
4
- "difficulty": "medium",
5
- "domain": "E-Commerce",
6
- "description": "The inventory service is failing to trigger restocking orders even when stock is zero.",
7
- "context": "The inventory logic depends on a minimum stock threshold. A red herring might point to the CDN edge node.",
8
- "symptoms": [
9
- "Stockouts",
10
- "Orders stuck in 'PENDING_STOCK'"
11
- ],
12
- "available_services": [
13
- "inventory-service",
14
- "cdn-edge-node",
15
- "order-processor"
16
- ],
17
- "initial_state": {
18
- "inventory-service": {
19
- "status": "running",
20
- "minimum_stock_threshold": "50",
21
- "last_reload": "1 day ago"
22
- },
23
- "cdn-edge-node": {
24
- "status": "running",
25
- "cache_expiry": "3600s"
26
- }
27
- },
28
- "root_cause": {
29
- "service": "inventory-service",
30
- "description": "Minimum stock threshold was accidentally hardcoded to a high value, preventing restocking."
31
- },
32
- "grading_criteria": {
33
- "inventory_threshold_fixed": 0.45,
34
- "inventory_restarted": 0.10,
35
- "penalty_cdn_edge_node_modified": -0.15,
36
- "fix_verified": 0.20,
37
- "efficiency_bonus": 0.10
38
- }
39
  }
 
1
+ {
2
+ "id": "business-process-failure",
3
+ "title": "Inventory Stockout Loop",
4
+ "difficulty": "medium",
5
+ "domain": "E-Commerce",
6
+ "description": "The inventory service is failing to trigger restocking orders even when stock is zero.",
7
+ "context": "The inventory logic depends on a minimum stock threshold. A red herring might point to the CDN edge node.",
8
+ "symptoms": [
9
+ "Stockouts",
10
+ "Orders stuck in 'PENDING_STOCK'"
11
+ ],
12
+ "available_services": [
13
+ "inventory-service",
14
+ "cdn-edge-node",
15
+ "order-processor"
16
+ ],
17
+ "initial_state": {
18
+ "inventory-service": {
19
+ "status": "running",
20
+ "minimum_stock_threshold": "50",
21
+ "last_reload": "1 day ago"
22
+ },
23
+ "cdn-edge-node": {
24
+ "status": "running",
25
+ "cache_expiry": "3600s"
26
+ }
27
+ },
28
+ "root_cause": {
29
+ "service": "inventory-service",
30
+ "description": "Minimum stock threshold was accidentally hardcoded to a high value, preventing restocking."
31
+ },
32
+ "grading_criteria": {
33
+ "inventory_threshold_fixed": 0.45,
34
+ "inventory_restarted": 0.10,
35
+ "penalty_cdn_edge_node_modified": -0.15,
36
+ "fix_verified": 0.20,
37
+ "efficiency_bonus": 0.10
38
+ }
39
  }
backend/utils/embeddings.py CHANGED
@@ -1,33 +1,33 @@
1
- import httpx
2
- from typing import List
3
- from functools import lru_cache
4
-
5
- @lru_cache(maxsize=256)
6
- def get_embedding(text: str) -> List[float]:
7
- """Get embedding vector using Ollama directly (Synchronous)"""
8
- try:
9
- response = httpx.post("http://localhost:11434/api/embeddings", json={
10
- "model": "all-minilm",
11
- "prompt": text
12
- }, timeout=60.0)
13
- return response.json().get("embedding", [])
14
- except Exception as e:
15
- import logging
16
- logging.error(f"Embedding failed: {e}. Using pseudo-embedding fallback.")
17
- import re
18
- import hashlib
19
- words = re.findall(r'\w+', text.lower())
20
- vec = [0.0] * 384
21
- for w in words:
22
- idx = int(hashlib.md5(w.encode()).hexdigest(), 16) % 384
23
- vec[idx] += 1.0
24
- return vec
25
-
26
- def cos_sim(a: List[float], b: List[float]) -> float:
27
- """Cosine similarity without PyTorch/Numpy dependencies"""
28
- if not a or not b: return 0.0
29
- dot_product = sum(x * y for x, y in zip(a, b))
30
- mag_a = sum(x * x for x in a) ** 0.5
31
- mag_b = sum(x * x for x in b) ** 0.5
32
- if mag_a == 0 or mag_b == 0: return 0.0
33
- return dot_product / (mag_a * mag_b)
 
1
+ import httpx
2
+ from typing import List
3
+ from functools import lru_cache
4
+
5
+ @lru_cache(maxsize=256)
6
+ def get_embedding(text: str) -> List[float]:
7
+ """Get embedding vector using Ollama directly (Synchronous)"""
8
+ try:
9
+ response = httpx.post("http://localhost:11434/api/embeddings", json={
10
+ "model": "all-minilm",
11
+ "prompt": text
12
+ }, timeout=60.0)
13
+ return response.json().get("embedding", [])
14
+ except Exception as e:
15
+ import logging
16
+ logging.error(f"Embedding failed: {e}. Using pseudo-embedding fallback.")
17
+ import re
18
+ import hashlib
19
+ words = re.findall(r'\w+', text.lower())
20
+ vec = [0.0] * 384
21
+ for w in words:
22
+ idx = int(hashlib.md5(w.encode()).hexdigest(), 16) % 384
23
+ vec[idx] += 1.0
24
+ return vec
25
+
26
+ def cos_sim(a: List[float], b: List[float]) -> float:
27
+ """Cosine similarity without PyTorch/Numpy dependencies"""
28
+ if not a or not b: return 0.0
29
+ dot_product = sum(x * y for x, y in zip(a, b))
30
+ mag_a = sum(x * x for x in a) ** 0.5
31
+ mag_b = sum(x * x for x in b) ** 0.5
32
+ if mag_a == 0 or mag_b == 0: return 0.0
33
+ return dot_product / (mag_a * mag_b)
frontend/postcss.config.cjs CHANGED
@@ -1,5 +1,5 @@
1
- module.exports = {
2
- plugins: {
3
- '@tailwindcss/postcss': {},
4
- },
5
- }
 
1
+ module.exports = {
2
+ plugins: {
3
+ '@tailwindcss/postcss': {},
4
+ },
5
+ }
frontend/src/components/EpisodeEndOverlay.jsx CHANGED
@@ -1,384 +1,384 @@
1
- import React from 'react';
2
-
3
- const EpisodeEndOverlay = ({ isOpen, onClose, metrics, gameState }) => {
4
- if (!isOpen) return null;
5
-
6
- const handleDownload = () => {
7
- if (!gameState) return;
8
-
9
- // Assemble the detailed incident report
10
- const sc = gameState.scenario || {};
11
- const agentA = gameState.agents?.agent_a?.messages || [];
12
- const agentB = gameState.agents?.agent_b?.messages || [];
13
-
14
- let report = `=================================================================\n`;
15
- report += ` NEXUS INCIDENT INVESTIGATION REPORT \n`;
16
- report += `=================================================================\n\n`;
17
-
18
- report += `[ SCENARIO METADATA ]\n`;
19
- report += `Title: ${sc.id || 'N/A'}\n`;
20
- report += `Domain: ${sc.domain || 'N/A'}\n`;
21
- report += `Difficulty: ${sc.difficulty || 'N/A'}\n`;
22
- report += `Final Grading Score: ${Number(gameState?.cumulativeReward || metrics?.score || 0).toFixed(4)} / 1.00\n`;
23
- report += `Total Steps: ${gameState?.step || metrics?.steps || 'N/A'}\n\n`;
24
-
25
- report += `[ STEP REWARDS ]\n`;
26
- if (gameState?.rewardHistory && gameState.rewardHistory.length > 0) {
27
- gameState.rewardHistory.forEach((r, i) => {
28
- report += `Step ${i + 1}: ${r.toFixed(4)}\n`;
29
- });
30
- report += `Average: ${(gameState.rewardHistory.reduce((a, b) => a + b, 0) / gameState.rewardHistory.length).toFixed(4)}\n`;
31
- report += `Final Grading Score: ${Number(gameState.cumulativeReward || 0).toFixed(4)}\n\n`;
32
- } else {
33
- report += `No step rewards recorded.\n\n`;
34
- }
35
-
36
- report += `[ REWARD BREAKDOWN ]\n`;
37
- if (gameState?.rewardBreakdown && Object.keys(gameState.rewardBreakdown).length > 0) {
38
- Object.entries(gameState.rewardBreakdown).forEach(([key, val]) => {
39
- report += `${key}: ${typeof val === 'number' ? val.toFixed(4) : val}\n`;
40
- });
41
- report += `\n`;
42
- }
43
-
44
- report += `[ INCIDENT DESCRIPTION & PROBLEM ]\n`;
45
- report += `${sc.description || 'No description provided.'}\n\n`;
46
-
47
- report += `[ CONTEXT & ROOT CAUSE ]\n`;
48
- report += `${sc.context || 'No context provided.'}\n`;
49
- report += `Actual Root Cause Validation: ${metrics?.rootCause || 'N/A'}\n\n`;
50
-
51
- report += `=================================================================\n`;
52
- report += `[ INVESTIGATION LOG & DETAILED TRACE ]\n`;
53
- report += `=================================================================\n\n`;
54
-
55
- // Interweave the messages to show the timeline (roughly)
56
- // Since we don't have exact timestamps, we'll just print Agent A then Agent B summary,
57
- // or just print all tools called and errors encountered.
58
- const allErrors = [];
59
- const allTools = [];
60
-
61
- [...agentA, ...agentB].forEach(msg => {
62
- if (msg.type === 'tool_call') {
63
- allTools.push(`- ${msg.tool_name}(${JSON.stringify(msg.params)})`);
64
- }
65
- if (msg.type === 'tool_result' && !msg.success) {
66
- allErrors.push(`- Error from ${msg.tool_name}: ${msg.result}`);
67
- }
68
- if (msg.type === 'tool_result' && msg.result?.toLowerCase().includes('error')) {
69
- // Catch strings that say error but were marked success true somehow
70
- allErrors.push(`- Log/Cmd Error: ${msg.result}`);
71
- }
72
- });
73
-
74
- report += `> EXECUTED TOOLS & COMMANDS:\n`;
75
- if (allTools.length > 0) {
76
- allTools.forEach(t => report += `${t}\n`);
77
- } else {
78
- report += `None.\n`;
79
- }
80
- report += `\n`;
81
-
82
- report += `> SYSTEMS ERRORS DETECTED DURING INVESTIGATION:\n`;
83
- if (allErrors.length > 0) {
84
- // deduplicate
85
- [...new Set(allErrors)].forEach(err => report += `${err}\n`);
86
- } else {
87
- report += `No significant system errors found during tool execution.\n`;
88
- }
89
- report += `\n`;
90
-
91
- report += `=================================================================\n`;
92
- report += `[ SOLUTION IMPLEMENTED & FIX VERIFICATION ]\n`;
93
- report += `=================================================================\n\n`;
94
- report += `The Validator Agent verified the proposed fix successfully, leading to the resolution of the incident.\n`;
95
- report += `End-state: ${metrics?.rootCause === 'VERIFIED' ? 'SUCCESS' : 'UNKNOWN'}\n\n`;
96
-
97
- report += `=================================================================\n`;
98
- report += `[ TIPS FOR IMPROVEMENT & RECOMMENDATIONS ]\n`;
99
- report += `=================================================================\n\n`;
100
- report += `Based on the automated evaluation of this scenario, consider the following:\n`;
101
-
102
- if (allTools.length > 15) {
103
- report += `1. EFFICIENCY: The agents called a large number of tools (${allTools.length}). Consider refining the initial hypothesis to reduce blind querying.\n`;
104
- } else {
105
- report += `1. EFFICIENCY: Tool execution was relatively concise (${allTools.length} calls).\n`;
106
- }
107
-
108
- if (allErrors.length > 5) {
109
- report += `2. ACCURACY: Multiple tool execution errors were encountered. Ensure exact syntax and correct tool parameters are used to minimize invalid calls.\n`;
110
- }
111
-
112
- report += `3. CAUSE-ANALYSIS: Always grep application error logs before querying databases to save time tracking downstream symptoms.\n`;
113
- report += `4. REMEDIATION: Post-incident reviews should establish better automated alerting for the specific failure domain (${sc.domain || 'general'}).\n`;
114
-
115
- // Trigger Download
116
- const blob = new Blob([report], { type: 'text/plain;charset=utf-8' });
117
- const url = URL.createObjectURL(blob);
118
- const a = document.createElement('a');
119
- a.href = url;
120
- a.download = `nexus_investigation_report_${sc.id || 'export'}.txt`;
121
- document.body.appendChild(a);
122
- a.click();
123
- document.body.removeChild(a);
124
- URL.revokeObjectURL(url);
125
- };
126
-
127
- return (
128
- <div className="fixed inset-0 z-[100] flex items-center justify-center p-4 md:p-8 animate-in fade-in duration-500">
129
- {/* Particle/Pulse Background */}
130
- <div className="absolute inset-0 bg-background/40 backdrop-blur-sm pointer-events-none">
131
- <div className="absolute top-1/2 left-1/2 -translate-x-1/2 -translate-y-1/2 w-[600px] h-[600px] opacity-10">
132
- <div className="w-full h-full rounded-full border-[1px] border-primary-container/20 animate-[ping_4s_infinite]"></div>
133
- </div>
134
- </div>
135
-
136
- {/* Summary Modal */}
137
- <div className="relative w-full max-w-4xl max-h-[90vh] glass-panel rounded-xl overflow-hidden shadow-[0_0_80px_rgba(0,0,0,0.8)] border border-white/10 flex flex-col">
138
- {/* Modal Header */}
139
- <div className="flex items-center justify-between p-6 bg-surface-container-highest/20 border-b border-white/5">
140
- <div className="flex items-center gap-3">
141
- <div className="p-1 rounded bg-primary-container/20 border border-primary-container/40">
142
- <span className="text-primary material-symbols-outlined text-xl">task_alt</span>
143
- </div>
144
- <h2 className="font-headline font-bold text-lg tracking-widest text-on-surface uppercase">Episode_Execution_Complete</h2>
145
- </div>
146
- <button onClick={onClose} className="text-outline hover:text-white transition-colors">
147
- <span className="material-symbols-outlined">close</span>
148
- </button>
149
- </div>
150
-
151
- {/* Scrollable Content Area */}
152
- <div className="flex-1 overflow-y-auto custom-scrollbar">
153
- <div className="p-8 grid grid-cols-1 md:grid-cols-2 gap-8">
154
- {/* Primary Metrics */}
155
- <div className="space-y-6">
156
- <div className="space-y-2">
157
- <span className="font-mono text-[10px] text-outline tracking-widest uppercase">Final Grading Score</span>
158
- <div className="flex items-baseline gap-2">
159
- <span className="font-headline text-8xl font-bold text-transparent bg-clip-text bg-gradient-to-br from-primary to-primary-container drop-shadow-[0_0_15px_rgba(0,212,255,0.3)]">
160
- {Number(gameState?.cumulativeReward || metrics?.score || 0).toFixed(2)}
161
- </span>
162
- <span className="font-headline text-2xl text-primary/40 font-light">/ 1.00</span>
163
- </div>
164
- </div>
165
-
166
- {/* Reward Breakdown from Episode */}
167
- {gameState?.rewardBreakdown && Object.keys(gameState.rewardBreakdown).length > 0 && (
168
- <div className="p-4 bg-surface-container-lowest/50 border border-white/10 rounded-lg">
169
- <span className="font-mono text-[10px] text-outline uppercase block mb-3">Step Reward Breakdown</span>
170
- <div className="grid grid-cols-4 gap-2">
171
- {Object.entries(gameState.rewardBreakdown).map(([key, val]) => (
172
- <div key={key} className="text-center bg-surface-container-high/30 rounded p-2">
173
- <div className="text-[8px] text-slate-500 uppercase truncate">{key.replace(/_/g, ' ')}</div>
174
- <div className={`font-mono text-sm font-bold ${val > 0 ? 'text-primary' : 'text-slate-600'}`}>
175
- {typeof val === 'number' ? val.toFixed(3) : val}
176
- </div>
177
- </div>
178
- ))}
179
- </div>
180
- </div>
181
- )}
182
-
183
- {/* Reward History */}
184
- {gameState?.rewardHistory && gameState.rewardHistory.length > 0 && (
185
- <div className="p-4 bg-surface-container-lowest/50 border border-white/10 rounded-lg">
186
- <span className="font-mono text-[10px] text-outline uppercase block mb-3">Step Rewards</span>
187
- <div className="flex items-end gap-1 h-16">
188
- {gameState.rewardHistory.map((r, i) => (
189
- <div key={i} className="flex-1 bg-primary/60 rounded-t"
190
- style={{ height: `${Math.max(5, (r / 1) * 100)}%` }}
191
- title={`Step ${i + 1}: ${r.toFixed(3)}`}>
192
- </div>
193
- ))}
194
- </div>
195
- <div className="flex justify-between mt-2 text-[9px] font-mono text-slate-500">
196
- <span>Avg: {(gameState.rewardHistory.reduce((a, b) => a + b, 0) / gameState.rewardHistory.length).toFixed(3)}</span>
197
- <span>Max: {Math.max(...gameState.rewardHistory).toFixed(3)}</span>
198
- </div>
199
- </div>
200
- )}
201
- <div className="grid grid-cols-2 gap-4">
202
- <div className="bg-surface-container-lowest/50 p-4 border-l border-primary/20 refractive-edge">
203
- <span className="font-mono text-[9px] text-outline uppercase block mb-1">Clues Found</span>
204
- <span className="font-headline text-2xl font-medium">{gameState?.clues_found?.length || 0}</span>
205
- </div>
206
- <div className="bg-surface-container-lowest/50 p-4 border-l border-primary/20 refractive-edge">
207
- <span className="font-mono text-[9px] text-outline uppercase block mb-1">Steps Executed</span>
208
- <span className="font-headline text-2xl font-medium">{gameState?.step !== undefined ? gameState.step : (metrics?.steps !== undefined ? metrics.steps : '—')}</span>
209
- </div>
210
- </div>
211
- <div className="flex items-center gap-4 p-5 bg-tertiary/5 border border-tertiary/10 rounded-lg">
212
- <div className="p-3 rounded-full bg-tertiary/10 text-tertiary">
213
- <span className="material-symbols-outlined">troubleshoot</span>
214
- </div>
215
- <div>
216
- <span className="font-mono text-[10px] text-tertiary/60 uppercase block">State Validation</span>
217
- <span className="text-sm font-medium tracking-wide">Status: <span className="font-mono text-tertiary">{metrics?.rootCause || '—'}</span></span>
218
- </div>
219
- </div>
220
- </div>
221
-
222
- {/* Right Column: Agent Metrics */}
223
- <div className="space-y-6">
224
- <h3 className="font-mono text-[10px] text-outline tracking-widest uppercase mb-4">Agent Performance Breakdown</h3>
225
- {/* Agent A */}
226
- <div className="relative group">
227
- <div className="absolute -left-4 top-0 bottom-0 w-1 bg-primary shadow-[0_0_8px_rgba(0,212,255,0.4)]"></div>
228
- <div className="bg-surface-container-low/40 p-5 space-y-4 border border-white/5 rounded-r-lg">
229
- <div className="flex justify-between items-center">
230
- <span className="font-headline font-bold text-primary tracking-tighter uppercase">Agent_Alpha</span>
231
- <span className="font-mono text-[10px] text-primary/50">CYAN_PROTOCOL</span>
232
- </div>
233
- {(() => {
234
- const msgs = gameState?.agents?.agent_a?.messages || [];
235
- const msgCount = msgs.filter(m => m.type === 'message').length;
236
- const toolCount = msgs.filter(m => m.type === 'tool_call').length;
237
- const errCount = msgs.filter(m => m.type === 'tool_result' && m.result?.toLowerCase().includes('error')).length;
238
- return (
239
- <div className="grid grid-cols-3 gap-2 text-center">
240
- <div>
241
- <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">chat</span> MSGS</span>
242
- <span className="font-headline text-lg font-medium text-primary">{msgCount}</span>
243
- </div>
244
- <div className="border-x border-white/5">
245
- <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">build</span> TOOLS</span>
246
- <span className="font-headline text-lg font-medium text-primary">{toolCount}</span>
247
- </div>
248
- <div>
249
- <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">warning</span> ERRS</span>
250
- <span className="font-headline text-lg font-medium text-primary">{errCount}</span>
251
- </div>
252
- </div>
253
- );
254
- })()}
255
- </div>
256
- </div>
257
- {/* Agent B */}
258
- <div className="relative group">
259
- <div className="absolute -left-4 top-0 bottom-0 w-1 bg-secondary shadow-[0_0_8px_rgba(221,183,255,0.4)]"></div>
260
- <div className="bg-surface-container-low/40 p-5 space-y-4 border border-white/5 rounded-r-lg">
261
- <div className="flex justify-between items-center">
262
- <span className="font-headline font-bold text-secondary tracking-tighter uppercase">Agent_Bravo</span>
263
- <span className="font-mono text-[10px] text-secondary/50">VIOLET_PROTOCOL</span>
264
- </div>
265
- {(() => {
266
- const msgs = gameState?.agents?.agent_b?.messages || [];
267
- const msgCount = msgs.filter(m => m.type === 'message').length;
268
- const toolCount = msgs.filter(m => m.type === 'tool_call').length;
269
- const errCount = msgs.filter(m => m.type === 'tool_result' && m.result?.toLowerCase().includes('error')).length;
270
- return (
271
- <div className="grid grid-cols-3 gap-2 text-center">
272
- <div>
273
- <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">chat</span> MSGS</span>
274
- <span className="font-headline text-lg font-medium text-secondary">{msgCount}</span>
275
- </div>
276
- <div className="border-x border-white/5">
277
- <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">build</span> TOOLS</span>
278
- <span className="font-headline text-lg font-medium text-secondary">{toolCount}</span>
279
- </div>
280
- <div>
281
- <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">warning</span> ERRS</span>
282
- <span className="font-headline text-lg font-medium text-secondary">{errCount}</span>
283
- </div>
284
- </div>
285
- );
286
- })()}
287
- </div>
288
- </div>
289
- </div>
290
- </div>
291
-
292
- {/* Submit Resolution Report Panel */}
293
- {(() => {
294
- const resCall = gameState?.tool_calls_made?.find(c => c.tool_name === 'submit_resolution');
295
- if (!resCall) return null;
296
- const p = resCall.params || {};
297
- return (
298
- <div className="px-8 pb-4">
299
- <div className="p-6 bg-surface-container-low/40 border border-primary/20 rounded-lg">
300
- <h3 className="font-headline font-bold text-primary tracking-widest uppercase mb-4 flex items-center gap-2">
301
- <span className="material-symbols-outlined">description</span>
302
- Incident Resolution Report
303
- </h3>
304
- <div className="space-y-4">
305
- <div>
306
- <span className="font-mono text-[10px] text-outline uppercase block mb-1">Root Cause Service</span>
307
- <span className="font-mono text-sm text-on-surface bg-surface-container p-1 px-2 rounded border border-white/5">{p.root_cause_service || 'UNKNOWN'}</span>
308
- </div>
309
- <div>
310
- <span className="font-mono text-[10px] text-outline uppercase block mb-1">Root Cause Description</span>
311
- <p className="text-sm text-on-surface/80">{p.root_cause_description || 'No description provided.'}</p>
312
- </div>
313
- <div className="p-4 bg-tertiary/5 border-l-2 border-tertiary rounded-r">
314
- <span className="font-mono text-[10px] text-tertiary uppercase block mb-1">Fix Applied</span>
315
- <p className="text-sm text-on-surface">{p.fix_applied || 'No fix described.'}</p>
316
- </div>
317
- </div>
318
- </div>
319
- </div>
320
- );
321
- })()}
322
-
323
- {/* Dual Agent Final Verdict Panel */}
324
- {(() => {
325
- const msgsA = gameState?.agents?.agent_a?.messages || [];
326
- const msgsB = gameState?.agents?.agent_b?.messages || [];
327
-
328
- const textMsgsA = msgsA.filter(m => m.type === 'message');
329
- const textMsgsB = msgsB.filter(m => m.type === 'message');
330
-
331
- const lastMsgA = textMsgsA[textMsgsA.length - 1];
332
- const lastMsgB = textMsgsB[textMsgsB.length - 1];
333
-
334
- if (!lastMsgA && !lastMsgB) return null;
335
-
336
- return (
337
- <div className="px-8 pb-8">
338
- <div className="p-6 bg-surface-container-low/40 border border-white/10 rounded-lg">
339
- <h3 className="font-headline font-bold text-on-surface tracking-widest uppercase mb-4 flex items-center gap-2">
340
- <span className="material-symbols-outlined">gavel</span>
341
- Dual Agent Final Verdict
342
- </h3>
343
- <div className="space-y-4">
344
- {lastMsgA && (
345
- <div className="p-4 bg-primary/5 border-l-2 border-primary rounded-r">
346
- <span className="font-mono text-[10px] text-primary uppercase block mb-1 tracking-widest">Agent Alpha Conclusion</span>
347
- <p className="text-sm text-on-surface/90 leading-relaxed">{lastMsgA.content || lastMsgA.text || lastMsgA.message}</p>
348
- </div>
349
- )}
350
- {lastMsgB && (
351
- <div className="p-4 bg-secondary/5 border-l-2 border-secondary rounded-r">
352
- <span className="font-mono text-[10px] text-secondary uppercase block mb-1 tracking-widest">Agent Bravo Conclusion</span>
353
- <p className="text-sm text-on-surface/90 leading-relaxed">{lastMsgB.content || lastMsgB.text || lastMsgB.message}</p>
354
- </div>
355
- )}
356
- </div>
357
- </div>
358
- </div>
359
- );
360
- })()}
361
- </div>
362
-
363
- {/* Modal Footer */}
364
- <div className="p-6 bg-surface-container-lowest/90 border-t border-white/5 flex flex-col md:flex-row justify-between items-center gap-4">
365
- <div className="flex items-center gap-2 text-outline/40">
366
- <span className="material-symbols-outlined text-sm">info</span>
367
- <span className="font-mono text-[9px] uppercase tracking-wider">Session telemetry encrypted and cached locally</span>
368
- </div>
369
- <div className="flex gap-4 w-full md:w-auto">
370
- <button onClick={handleDownload} className="flex-1 md:flex-none px-8 py-2.5 bg-transparent border border-outline-variant/30 text-on-surface hover:bg-white/5 transition-all font-mono text-xs tracking-widest uppercase">
371
- Export Log
372
- </button>
373
- <button onClick={onClose} className="flex-1 md:flex-none px-12 py-2.5 bg-primary/20 border border-primary text-primary hover:bg-primary/30 transition-all font-mono text-xs tracking-widest font-bold uppercase shadow-[0_0_20px_rgba(0,212,255,0.1)]">
374
- Dismiss
375
- </button>
376
- </div>
377
- </div>
378
- </div>
379
- </div>
380
- );
381
- };
382
-
383
- export default EpisodeEndOverlay;
384
-
 
1
+ import React from 'react';
2
+
3
+ const EpisodeEndOverlay = ({ isOpen, onClose, metrics, gameState }) => {
4
+ if (!isOpen) return null;
5
+
6
+ const handleDownload = () => {
7
+ if (!gameState) return;
8
+
9
+ // Assemble the detailed incident report
10
+ const sc = gameState.scenario || {};
11
+ const agentA = gameState.agents?.agent_a?.messages || [];
12
+ const agentB = gameState.agents?.agent_b?.messages || [];
13
+
14
+ let report = `=================================================================\n`;
15
+ report += ` NEXUS INCIDENT INVESTIGATION REPORT \n`;
16
+ report += `=================================================================\n\n`;
17
+
18
+ report += `[ SCENARIO METADATA ]\n`;
19
+ report += `Title: ${sc.id || 'N/A'}\n`;
20
+ report += `Domain: ${sc.domain || 'N/A'}\n`;
21
+ report += `Difficulty: ${sc.difficulty || 'N/A'}\n`;
22
+ report += `Final Grading Score: ${Number(gameState?.cumulativeReward || metrics?.score || 0).toFixed(4)} / 1.00\n`;
23
+ report += `Total Steps: ${gameState?.step || metrics?.steps || 'N/A'}\n\n`;
24
+
25
+ report += `[ STEP REWARDS ]\n`;
26
+ if (gameState?.rewardHistory && gameState.rewardHistory.length > 0) {
27
+ gameState.rewardHistory.forEach((r, i) => {
28
+ report += `Step ${i + 1}: ${r.toFixed(4)}\n`;
29
+ });
30
+ report += `Average: ${(gameState.rewardHistory.reduce((a, b) => a + b, 0) / gameState.rewardHistory.length).toFixed(4)}\n`;
31
+ report += `Final Grading Score: ${Number(gameState.cumulativeReward || 0).toFixed(4)}\n\n`;
32
+ } else {
33
+ report += `No step rewards recorded.\n\n`;
34
+ }
35
+
36
+ report += `[ REWARD BREAKDOWN ]\n`;
37
+ if (gameState?.rewardBreakdown && Object.keys(gameState.rewardBreakdown).length > 0) {
38
+ Object.entries(gameState.rewardBreakdown).forEach(([key, val]) => {
39
+ report += `${key}: ${typeof val === 'number' ? val.toFixed(4) : val}\n`;
40
+ });
41
+ report += `\n`;
42
+ }
43
+
44
+ report += `[ INCIDENT DESCRIPTION & PROBLEM ]\n`;
45
+ report += `${sc.description || 'No description provided.'}\n\n`;
46
+
47
+ report += `[ CONTEXT & ROOT CAUSE ]\n`;
48
+ report += `${sc.context || 'No context provided.'}\n`;
49
+ report += `Actual Root Cause Validation: ${metrics?.rootCause || 'N/A'}\n\n`;
50
+
51
+ report += `=================================================================\n`;
52
+ report += `[ INVESTIGATION LOG & DETAILED TRACE ]\n`;
53
+ report += `=================================================================\n\n`;
54
+
55
+ // Interweave the messages to show the timeline (roughly)
56
+ // Since we don't have exact timestamps, we'll just print Agent A then Agent B summary,
57
+ // or just print all tools called and errors encountered.
58
+ const allErrors = [];
59
+ const allTools = [];
60
+
61
+ [...agentA, ...agentB].forEach(msg => {
62
+ if (msg.type === 'tool_call') {
63
+ allTools.push(`- ${msg.tool_name}(${JSON.stringify(msg.params)})`);
64
+ }
65
+ if (msg.type === 'tool_result' && !msg.success) {
66
+ allErrors.push(`- Error from ${msg.tool_name}: ${msg.result}`);
67
+ }
68
+ if (msg.type === 'tool_result' && msg.result?.toLowerCase().includes('error')) {
69
+ // Catch strings that say error but were marked success true somehow
70
+ allErrors.push(`- Log/Cmd Error: ${msg.result}`);
71
+ }
72
+ });
73
+
74
+ report += `> EXECUTED TOOLS & COMMANDS:\n`;
75
+ if (allTools.length > 0) {
76
+ allTools.forEach(t => report += `${t}\n`);
77
+ } else {
78
+ report += `None.\n`;
79
+ }
80
+ report += `\n`;
81
+
82
+ report += `> SYSTEMS ERRORS DETECTED DURING INVESTIGATION:\n`;
83
+ if (allErrors.length > 0) {
84
+ // deduplicate
85
+ [...new Set(allErrors)].forEach(err => report += `${err}\n`);
86
+ } else {
87
+ report += `No significant system errors found during tool execution.\n`;
88
+ }
89
+ report += `\n`;
90
+
91
+ report += `=================================================================\n`;
92
+ report += `[ SOLUTION IMPLEMENTED & FIX VERIFICATION ]\n`;
93
+ report += `=================================================================\n\n`;
94
+ report += `The Validator Agent verified the proposed fix successfully, leading to the resolution of the incident.\n`;
95
+ report += `End-state: ${metrics?.rootCause === 'VERIFIED' ? 'SUCCESS' : 'UNKNOWN'}\n\n`;
96
+
97
+ report += `=================================================================\n`;
98
+ report += `[ TIPS FOR IMPROVEMENT & RECOMMENDATIONS ]\n`;
99
+ report += `=================================================================\n\n`;
100
+ report += `Based on the automated evaluation of this scenario, consider the following:\n`;
101
+
102
+ if (allTools.length > 15) {
103
+ report += `1. EFFICIENCY: The agents called a large number of tools (${allTools.length}). Consider refining the initial hypothesis to reduce blind querying.\n`;
104
+ } else {
105
+ report += `1. EFFICIENCY: Tool execution was relatively concise (${allTools.length} calls).\n`;
106
+ }
107
+
108
+ if (allErrors.length > 5) {
109
+ report += `2. ACCURACY: Multiple tool execution errors were encountered. Ensure exact syntax and correct tool parameters are used to minimize invalid calls.\n`;
110
+ }
111
+
112
+ report += `3. CAUSE-ANALYSIS: Always grep application error logs before querying databases to save time tracking downstream symptoms.\n`;
113
+ report += `4. REMEDIATION: Post-incident reviews should establish better automated alerting for the specific failure domain (${sc.domain || 'general'}).\n`;
114
+
115
+ // Trigger Download
116
+ const blob = new Blob([report], { type: 'text/plain;charset=utf-8' });
117
+ const url = URL.createObjectURL(blob);
118
+ const a = document.createElement('a');
119
+ a.href = url;
120
+ a.download = `nexus_investigation_report_${sc.id || 'export'}.txt`;
121
+ document.body.appendChild(a);
122
+ a.click();
123
+ document.body.removeChild(a);
124
+ URL.revokeObjectURL(url);
125
+ };
126
+
127
+ return (
128
+ <div className="fixed inset-0 z-[100] flex items-center justify-center p-4 md:p-8 animate-in fade-in duration-500">
129
+ {/* Particle/Pulse Background */}
130
+ <div className="absolute inset-0 bg-background/40 backdrop-blur-sm pointer-events-none">
131
+ <div className="absolute top-1/2 left-1/2 -translate-x-1/2 -translate-y-1/2 w-[600px] h-[600px] opacity-10">
132
+ <div className="w-full h-full rounded-full border-[1px] border-primary-container/20 animate-[ping_4s_infinite]"></div>
133
+ </div>
134
+ </div>
135
+
136
+ {/* Summary Modal */}
137
+ <div className="relative w-full max-w-4xl max-h-[90vh] glass-panel rounded-xl overflow-hidden shadow-[0_0_80px_rgba(0,0,0,0.8)] border border-white/10 flex flex-col">
138
+ {/* Modal Header */}
139
+ <div className="flex items-center justify-between p-6 bg-surface-container-highest/20 border-b border-white/5">
140
+ <div className="flex items-center gap-3">
141
+ <div className="p-1 rounded bg-primary-container/20 border border-primary-container/40">
142
+ <span className="text-primary material-symbols-outlined text-xl">task_alt</span>
143
+ </div>
144
+ <h2 className="font-headline font-bold text-lg tracking-widest text-on-surface uppercase">Episode_Execution_Complete</h2>
145
+ </div>
146
+ <button onClick={onClose} className="text-outline hover:text-white transition-colors">
147
+ <span className="material-symbols-outlined">close</span>
148
+ </button>
149
+ </div>
150
+
151
+ {/* Scrollable Content Area */}
152
+ <div className="flex-1 overflow-y-auto custom-scrollbar">
153
+ <div className="p-8 grid grid-cols-1 md:grid-cols-2 gap-8">
154
+ {/* Primary Metrics */}
155
+ <div className="space-y-6">
156
+ <div className="space-y-2">
157
+ <span className="font-mono text-[10px] text-outline tracking-widest uppercase">Final Grading Score</span>
158
+ <div className="flex items-baseline gap-2">
159
+ <span className="font-headline text-8xl font-bold text-transparent bg-clip-text bg-gradient-to-br from-primary to-primary-container drop-shadow-[0_0_15px_rgba(0,212,255,0.3)]">
160
+ {Number(gameState?.cumulativeReward || metrics?.score || 0).toFixed(2)}
161
+ </span>
162
+ <span className="font-headline text-2xl text-primary/40 font-light">/ 1.00</span>
163
+ </div>
164
+ </div>
165
+
166
+ {/* Reward Breakdown from Episode */}
167
+ {gameState?.rewardBreakdown && Object.keys(gameState.rewardBreakdown).length > 0 && (
168
+ <div className="p-4 bg-surface-container-lowest/50 border border-white/10 rounded-lg">
169
+ <span className="font-mono text-[10px] text-outline uppercase block mb-3">Step Reward Breakdown</span>
170
+ <div className="grid grid-cols-4 gap-2">
171
+ {Object.entries(gameState.rewardBreakdown).map(([key, val]) => (
172
+ <div key={key} className="text-center bg-surface-container-high/30 rounded p-2">
173
+ <div className="text-[8px] text-slate-500 uppercase truncate">{key.replace(/_/g, ' ')}</div>
174
+ <div className={`font-mono text-sm font-bold ${val > 0 ? 'text-primary' : 'text-slate-600'}`}>
175
+ {typeof val === 'number' ? val.toFixed(3) : val}
176
+ </div>
177
+ </div>
178
+ ))}
179
+ </div>
180
+ </div>
181
+ )}
182
+
183
+ {/* Reward History */}
184
+ {gameState?.rewardHistory && gameState.rewardHistory.length > 0 && (
185
+ <div className="p-4 bg-surface-container-lowest/50 border border-white/10 rounded-lg">
186
+ <span className="font-mono text-[10px] text-outline uppercase block mb-3">Step Rewards</span>
187
+ <div className="flex items-end gap-1 h-16">
188
+ {gameState.rewardHistory.map((r, i) => (
189
+ <div key={i} className="flex-1 bg-primary/60 rounded-t"
190
+ style={{ height: `${Math.max(5, (r / 1) * 100)}%` }}
191
+ title={`Step ${i + 1}: ${r.toFixed(3)}`}>
192
+ </div>
193
+ ))}
194
+ </div>
195
+ <div className="flex justify-between mt-2 text-[9px] font-mono text-slate-500">
196
+ <span>Avg: {(gameState.rewardHistory.reduce((a, b) => a + b, 0) / gameState.rewardHistory.length).toFixed(3)}</span>
197
+ <span>Max: {Math.max(...gameState.rewardHistory).toFixed(3)}</span>
198
+ </div>
199
+ </div>
200
+ )}
201
+ <div className="grid grid-cols-2 gap-4">
202
+ <div className="bg-surface-container-lowest/50 p-4 border-l border-primary/20 refractive-edge">
203
+ <span className="font-mono text-[9px] text-outline uppercase block mb-1">Clues Found</span>
204
+ <span className="font-headline text-2xl font-medium">{gameState?.clues_found?.length || 0}</span>
205
+ </div>
206
+ <div className="bg-surface-container-lowest/50 p-4 border-l border-primary/20 refractive-edge">
207
+ <span className="font-mono text-[9px] text-outline uppercase block mb-1">Steps Executed</span>
208
+ <span className="font-headline text-2xl font-medium">{gameState?.step !== undefined ? gameState.step : (metrics?.steps !== undefined ? metrics.steps : '—')}</span>
209
+ </div>
210
+ </div>
211
+ <div className="flex items-center gap-4 p-5 bg-tertiary/5 border border-tertiary/10 rounded-lg">
212
+ <div className="p-3 rounded-full bg-tertiary/10 text-tertiary">
213
+ <span className="material-symbols-outlined">troubleshoot</span>
214
+ </div>
215
+ <div>
216
+ <span className="font-mono text-[10px] text-tertiary/60 uppercase block">State Validation</span>
217
+ <span className="text-sm font-medium tracking-wide">Status: <span className="font-mono text-tertiary">{metrics?.rootCause || '—'}</span></span>
218
+ </div>
219
+ </div>
220
+ </div>
221
+
222
+ {/* Right Column: Agent Metrics */}
223
+ <div className="space-y-6">
224
+ <h3 className="font-mono text-[10px] text-outline tracking-widest uppercase mb-4">Agent Performance Breakdown</h3>
225
+ {/* Agent A */}
226
+ <div className="relative group">
227
+ <div className="absolute -left-4 top-0 bottom-0 w-1 bg-primary shadow-[0_0_8px_rgba(0,212,255,0.4)]"></div>
228
+ <div className="bg-surface-container-low/40 p-5 space-y-4 border border-white/5 rounded-r-lg">
229
+ <div className="flex justify-between items-center">
230
+ <span className="font-headline font-bold text-primary tracking-tighter uppercase">Agent_Alpha</span>
231
+ <span className="font-mono text-[10px] text-primary/50">CYAN_PROTOCOL</span>
232
+ </div>
233
+ {(() => {
234
+ const msgs = gameState?.agents?.agent_a?.messages || [];
235
+ const msgCount = msgs.filter(m => m.type === 'message').length;
236
+ const toolCount = msgs.filter(m => m.type === 'tool_call').length;
237
+ const errCount = msgs.filter(m => m.type === 'tool_result' && m.result?.toLowerCase().includes('error')).length;
238
+ return (
239
+ <div className="grid grid-cols-3 gap-2 text-center">
240
+ <div>
241
+ <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">chat</span> MSGS</span>
242
+ <span className="font-headline text-lg font-medium text-primary">{msgCount}</span>
243
+ </div>
244
+ <div className="border-x border-white/5">
245
+ <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">build</span> TOOLS</span>
246
+ <span className="font-headline text-lg font-medium text-primary">{toolCount}</span>
247
+ </div>
248
+ <div>
249
+ <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">warning</span> ERRS</span>
250
+ <span className="font-headline text-lg font-medium text-primary">{errCount}</span>
251
+ </div>
252
+ </div>
253
+ );
254
+ })()}
255
+ </div>
256
+ </div>
257
+ {/* Agent B */}
258
+ <div className="relative group">
259
+ <div className="absolute -left-4 top-0 bottom-0 w-1 bg-secondary shadow-[0_0_8px_rgba(221,183,255,0.4)]"></div>
260
+ <div className="bg-surface-container-low/40 p-5 space-y-4 border border-white/5 rounded-r-lg">
261
+ <div className="flex justify-between items-center">
262
+ <span className="font-headline font-bold text-secondary tracking-tighter uppercase">Agent_Bravo</span>
263
+ <span className="font-mono text-[10px] text-secondary/50">VIOLET_PROTOCOL</span>
264
+ </div>
265
+ {(() => {
266
+ const msgs = gameState?.agents?.agent_b?.messages || [];
267
+ const msgCount = msgs.filter(m => m.type === 'message').length;
268
+ const toolCount = msgs.filter(m => m.type === 'tool_call').length;
269
+ const errCount = msgs.filter(m => m.type === 'tool_result' && m.result?.toLowerCase().includes('error')).length;
270
+ return (
271
+ <div className="grid grid-cols-3 gap-2 text-center">
272
+ <div>
273
+ <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">chat</span> MSGS</span>
274
+ <span className="font-headline text-lg font-medium text-secondary">{msgCount}</span>
275
+ </div>
276
+ <div className="border-x border-white/5">
277
+ <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">build</span> TOOLS</span>
278
+ <span className="font-headline text-lg font-medium text-secondary">{toolCount}</span>
279
+ </div>
280
+ <div>
281
+ <span className="font-mono text-[9px] text-outline flex flex-col items-center justify-center gap-1 uppercase"><span className="material-symbols-outlined text-[12px]">warning</span> ERRS</span>
282
+ <span className="font-headline text-lg font-medium text-secondary">{errCount}</span>
283
+ </div>
284
+ </div>
285
+ );
286
+ })()}
287
+ </div>
288
+ </div>
289
+ </div>
290
+ </div>
291
+
292
+ {/* Submit Resolution Report Panel */}
293
+ {(() => {
294
+ const resCall = gameState?.tool_calls_made?.find(c => c.tool_name === 'submit_resolution');
295
+ if (!resCall) return null;
296
+ const p = resCall.params || {};
297
+ return (
298
+ <div className="px-8 pb-4">
299
+ <div className="p-6 bg-surface-container-low/40 border border-primary/20 rounded-lg">
300
+ <h3 className="font-headline font-bold text-primary tracking-widest uppercase mb-4 flex items-center gap-2">
301
+ <span className="material-symbols-outlined">description</span>
302
+ Incident Resolution Report
303
+ </h3>
304
+ <div className="space-y-4">
305
+ <div>
306
+ <span className="font-mono text-[10px] text-outline uppercase block mb-1">Root Cause Service</span>
307
+ <span className="font-mono text-sm text-on-surface bg-surface-container p-1 px-2 rounded border border-white/5">{p.root_cause_service || 'UNKNOWN'}</span>
308
+ </div>
309
+ <div>
310
+ <span className="font-mono text-[10px] text-outline uppercase block mb-1">Root Cause Description</span>
311
+ <p className="text-sm text-on-surface/80">{p.root_cause_description || 'No description provided.'}</p>
312
+ </div>
313
+ <div className="p-4 bg-tertiary/5 border-l-2 border-tertiary rounded-r">
314
+ <span className="font-mono text-[10px] text-tertiary uppercase block mb-1">Fix Applied</span>
315
+ <p className="text-sm text-on-surface">{p.fix_applied || 'No fix described.'}</p>
316
+ </div>
317
+ </div>
318
+ </div>
319
+ </div>
320
+ );
321
+ })()}
322
+
323
+ {/* Dual Agent Final Verdict Panel */}
324
+ {(() => {
325
+ const msgsA = gameState?.agents?.agent_a?.messages || [];
326
+ const msgsB = gameState?.agents?.agent_b?.messages || [];
327
+
328
+ const textMsgsA = msgsA.filter(m => m.type === 'message');
329
+ const textMsgsB = msgsB.filter(m => m.type === 'message');
330
+
331
+ const lastMsgA = textMsgsA[textMsgsA.length - 1];
332
+ const lastMsgB = textMsgsB[textMsgsB.length - 1];
333
+
334
+ if (!lastMsgA && !lastMsgB) return null;
335
+
336
+ return (
337
+ <div className="px-8 pb-8">
338
+ <div className="p-6 bg-surface-container-low/40 border border-white/10 rounded-lg">
339
+ <h3 className="font-headline font-bold text-on-surface tracking-widest uppercase mb-4 flex items-center gap-2">
340
+ <span className="material-symbols-outlined">gavel</span>
341
+ Dual Agent Final Verdict
342
+ </h3>
343
+ <div className="space-y-4">
344
+ {lastMsgA && (
345
+ <div className="p-4 bg-primary/5 border-l-2 border-primary rounded-r">
346
+ <span className="font-mono text-[10px] text-primary uppercase block mb-1 tracking-widest">Agent Alpha Conclusion</span>
347
+ <p className="text-sm text-on-surface/90 leading-relaxed">{lastMsgA.content || lastMsgA.text || lastMsgA.message}</p>
348
+ </div>
349
+ )}
350
+ {lastMsgB && (
351
+ <div className="p-4 bg-secondary/5 border-l-2 border-secondary rounded-r">
352
+ <span className="font-mono text-[10px] text-secondary uppercase block mb-1 tracking-widest">Agent Bravo Conclusion</span>
353
+ <p className="text-sm text-on-surface/90 leading-relaxed">{lastMsgB.content || lastMsgB.text || lastMsgB.message}</p>
354
+ </div>
355
+ )}
356
+ </div>
357
+ </div>
358
+ </div>
359
+ );
360
+ })()}
361
+ </div>
362
+
363
+ {/* Modal Footer */}
364
+ <div className="p-6 bg-surface-container-lowest/90 border-t border-white/5 flex flex-col md:flex-row justify-between items-center gap-4">
365
+ <div className="flex items-center gap-2 text-outline/40">
366
+ <span className="material-symbols-outlined text-sm">info</span>
367
+ <span className="font-mono text-[9px] uppercase tracking-wider">Session telemetry encrypted and cached locally</span>
368
+ </div>
369
+ <div className="flex gap-4 w-full md:w-auto">
370
+ <button onClick={handleDownload} className="flex-1 md:flex-none px-8 py-2.5 bg-transparent border border-outline-variant/30 text-on-surface hover:bg-white/5 transition-all font-mono text-xs tracking-widest uppercase">
371
+ Export Log
372
+ </button>
373
+ <button onClick={onClose} className="flex-1 md:flex-none px-12 py-2.5 bg-primary/20 border border-primary text-primary hover:bg-primary/30 transition-all font-mono text-xs tracking-widest font-bold uppercase shadow-[0_0_20px_rgba(0,212,255,0.1)]">
374
+ Dismiss
375
+ </button>
376
+ </div>
377
+ </div>
378
+ </div>
379
+ </div>
380
+ );
381
+ };
382
+
383
+ export default EpisodeEndOverlay;
384
+
frontend/src/components/Layout.jsx CHANGED
@@ -1,203 +1,203 @@
1
- import React, { useState, useRef, useEffect } from 'react';
2
- import { config } from '../config';
3
- import TopNavBar from './TopNavBar';
4
- import SideNavBar from './SideNavBar';
5
-
6
- /* ─── Terminal Panel ─── */
7
- const COMMANDS = {
8
- help: () => ['Commands: help | status | clear | echo <text>'],
9
- status: () => ['Agent A (INV-01): STANDBY', 'Agent B (VAL-01): STANDBY', `WebSocket: ${config.WS_URL} — CONNECTED`, 'Episode: None active'],
10
- };
11
-
12
- const TerminalDrawer = ({ onClose }) => {
13
- const [input, setInput] = useState('');
14
- const [lines, setLines] = useState([{ type: 'system', text: '// NEXUS Terminal v2.0 — type "help" for commands' }]);
15
- const [history, setHistory] = useState([]);
16
- const [histIdx, setHistIdx] = useState(-1);
17
- const endRef = useRef(null);
18
- const inputRef = useRef(null);
19
-
20
- useEffect(() => { endRef.current?.scrollIntoView({ behavior: 'smooth' }); }, [lines]);
21
- useEffect(() => { inputRef.current?.focus(); }, []);
22
-
23
- const run = (e) => {
24
- e.preventDefault();
25
- const cmd = input.trim();
26
- if (!cmd) return;
27
- setHistory(h => [cmd, ...h].slice(0, 50));
28
- setHistIdx(-1);
29
- if (cmd.toLowerCase() === 'clear') { setLines([]); setInput(''); return; }
30
- const parts = cmd.toLowerCase().split(' ');
31
- let output, type;
32
- if (parts[0] === 'echo') { output = [cmd.slice(5) || '']; type = 'output'; }
33
- else if (COMMANDS[parts[0]]) { output = COMMANDS[parts[0]](); type = 'output'; }
34
- else { output = [`Command not found: ${parts[0]}. Type "help".`]; type = 'error'; }
35
- setLines(l => [...l, { type: 'input', text: `nexus@terminal:~$ ${cmd}` }, ...output.map(t => ({ type, text: t }))]);
36
- setInput('');
37
- };
38
-
39
- const handleKey = (e) => {
40
- if (e.key === 'ArrowUp') { const i = Math.min(histIdx + 1, history.length - 1); setHistIdx(i); setInput(history[i] ?? ''); e.preventDefault(); }
41
- if (e.key === 'ArrowDown') { const i = Math.max(histIdx - 1, -1); setHistIdx(i); setInput(i === -1 ? '' : history[i]); e.preventDefault(); }
42
- };
43
-
44
- const colorMap = { system: 'text-slate-600 italic', input: 'text-primary', output: 'text-on-surface/80', error: 'text-error' };
45
-
46
- return (
47
- <div className="flex flex-col h-full" onClick={() => inputRef.current?.focus()}>
48
- <div className="flex-1 p-3 font-mono text-xs overflow-y-auto space-y-0.5 bg-surface-container-lowest cursor-text">
49
- {lines.map((l, i) => <div key={i} className={colorMap[l.type]}>{l.text}</div>)}
50
- <div ref={endRef} />
51
- </div>
52
- <form onSubmit={run} className="flex items-center gap-2 px-3 py-2 border-t border-white/5 bg-surface-container-lowest shrink-0">
53
- <span className="text-primary font-mono text-xs shrink-0">nexus@terminal:~$</span>
54
- <input ref={inputRef} value={input} onChange={e => setInput(e.target.value)} onKeyDown={handleKey}
55
- className="flex-1 bg-transparent font-mono text-xs text-on-surface focus:outline-none placeholder:text-slate-700"
56
- placeholder="type a command and press Enter..." />
57
- </form>
58
- </div>
59
- );
60
- };
61
-
62
- /* ─── Communication Panel ─── */
63
- const CommunicationDrawer = () => (
64
- <div className="flex flex-col h-full p-4 font-mono text-xs space-y-2 bg-surface-container-lowest overflow-y-auto">
65
- {[
66
- { agent: 'AGENT_A', msg: 'Awaiting objective. Standing by for episode_start event.', time: '—', color: 'text-primary' },
67
- { agent: 'AGENT_B', msg: 'Validation module idle. Ready to receive investigator output.', time: '—', color: 'text-secondary' },
68
- { agent: 'SYSTEM', msg: 'No active episode. Use START to begin.', time: '—', color: 'text-outline-variant' },
69
- ].map((m, i) => (
70
- <div key={i} className="flex gap-3 py-1.5 border-b border-white/5">
71
- <span className={`${m.color} font-bold shrink-0 w-20`}>[{m.agent}]</span>
72
- <span className="text-on-surface/70">{m.msg}</span>
73
- <span className="text-slate-600 ml-auto shrink-0">{m.time}</span>
74
- </div>
75
- ))}
76
- </div>
77
- );
78
-
79
- /* ─── Reward Analytics Panel ─── */
80
- const AnalyticsDrawer = () => {
81
- const stats = [
82
- { label: 'Avg Reward', value: '—', color: 'text-primary' },
83
- { label: 'Best Step', value: '—', color: 'text-tertiary' },
84
- { label: 'Root Cause', value: '—', color: 'text-tertiary' },
85
- { label: 'Steps Run', value: '—', color: 'text-on-surface' },
86
- { label: 'Episodes', value: '—', color: 'text-on-surface' },
87
- { label: 'Success Rate', value: '—', color: 'text-secondary' },
88
- ];
89
- return (
90
- <div className="flex h-full">
91
- {/* Reward chart placeholder */}
92
- <div className="flex-1 p-4 flex flex-col">
93
- <p className="text-[9px] font-mono text-outline-variant uppercase mb-2">Cumulative Reward Over Steps</p>
94
- <div className="flex-1 flex items-end gap-1 border-l border-b border-outline-variant/20 px-2 pb-1">
95
- {[12, 24, 18, 36, 30, 48, 42, 60].map((h, i) => (
96
- <div key={i} className="flex-1 flex flex-col items-center justify-end">
97
- <div className="w-full bg-primary/30 rounded-sm transition-all" style={{ height: `${h}%` }}></div>
98
- </div>
99
- ))}
100
- </div>
101
- <p className="text-[9px] font-mono text-outline-variant/40 italic mt-1">No live data — connect to episode to populate</p>
102
- </div>
103
- {/* Stat grid */}
104
- <div className="w-48 shrink-0 p-3 border-l border-white/5 grid grid-cols-2 gap-2 content-start">
105
- {stats.map(s => (
106
- <div key={s.label} className="bg-surface-container p-2 rounded border border-white/5">
107
- <span className="text-[8px] font-mono text-outline-variant block uppercase truncate">{s.label}</span>
108
- <span className={`text-sm font-bold font-mono ${s.color}`}>{s.value}</span>
109
- </div>
110
- ))}
111
- </div>
112
- </div>
113
- );
114
- };
115
-
116
- /* ─── Layout ─── */
117
- const TABS = [
118
- { id: 'communication', label: 'Communication', icon: 'forum' },
119
- { id: 'terminal', label: 'Terminal', icon: 'code' },
120
- ];
121
-
122
- const Layout = ({ children }) => {
123
- const [activeTab, setActiveTab] = useState(null); // null = closed
124
-
125
- const toggle = (id) => setActiveTab(prev => prev === id ? null : id);
126
-
127
- /* drawer height when open */
128
- const drawerH = 'h-64';
129
-
130
- return (
131
- <div className="min-h-screen flex flex-col">
132
- <TopNavBar />
133
- <SideNavBar />
134
-
135
- {/* Main scrollable area — leave room for fixed footer + optional drawer */}
136
- <main className={`ml-20 pt-16 flex-1 transition-all ${activeTab ? 'pb-[calc(48px+256px)]' : 'pb-12'}`}>
137
- <div className="p-8 max-w-[1600px] mx-auto">
138
- {children}
139
- </div>
140
- </main>
141
-
142
- {/* Sliding drawer */}
143
- {activeTab && (
144
- <div className={`fixed bottom-12 left-20 right-0 ${drawerH} z-40 bg-surface border-t border-primary/20 shadow-[0_-10px_40px_rgba(0,0,0,0.6)] flex flex-col`}>
145
- {/* Drawer title bar */}
146
- <div className="flex items-center justify-between px-5 py-2 bg-surface-container border-b border-white/5 shrink-0">
147
- <div className="flex items-center gap-2">
148
- <span className="material-symbols-outlined text-primary text-sm">
149
- {TABS.find(t => t.id === activeTab)?.icon}
150
- </span>
151
- <span className="font-mono text-xs text-primary uppercase tracking-widest">
152
- {TABS.find(t => t.id === activeTab)?.label}
153
- </span>
154
- </div>
155
- <button onClick={() => setActiveTab(null)} className="text-slate-500 hover:text-white transition-colors">
156
- <span className="material-symbols-outlined text-sm">keyboard_arrow_down</span>
157
- </button>
158
- </div>
159
- {/* Drawer content */}
160
- <div className="flex-1 overflow-hidden">
161
- {activeTab === 'terminal' && <TerminalDrawer onClose={() => setActiveTab(null)} />}
162
- {activeTab === 'communication' && <CommunicationDrawer />}
163
- {activeTab === 'analytics' && <AnalyticsDrawer />}
164
- </div>
165
- </div>
166
- )}
167
-
168
- {/* Footer tab bar */}
169
- <footer className="fixed bottom-0 left-0 w-full h-12 bg-background/90 backdrop-blur-2xl z-50 flex items-center border-t border-primary/15 px-8 shadow-[0_-10px_30px_rgba(0,0,0,0.5)]">
170
- {/* Left: ticker */}
171
- <div className="flex-1 hidden md:flex items-center gap-2 overflow-hidden">
172
- <span className="text-[9px] font-mono text-outline-variant italic uppercase tracking-tight whitespace-nowrap">
173
- SYSTEM_INITIALIZED: STANDBY FOR AGENT HANDSHAKE...
174
- </span>
175
- </div>
176
-
177
- {/* Centre: tabs */}
178
- <div className="flex items-center gap-1 shrink-0">
179
- {TABS.map(tab => (
180
- <button
181
- key={tab.id}
182
- onClick={() => toggle(tab.id)}
183
- className={`flex items-center gap-2 px-4 h-12 transition-all border-t-2 font-mono text-[10px] tracking-widest uppercase ${activeTab === tab.id
184
- ? 'border-primary text-primary bg-primary/10'
185
- : 'border-transparent text-slate-500 hover:text-primary hover:bg-white/5'
186
- }`}
187
- >
188
- <span className="material-symbols-outlined text-base">{tab.icon}</span>
189
- {tab.label}
190
- </button>
191
- ))}
192
- </div>
193
-
194
- {/* Right: session info */}
195
- <div className="flex-1 hidden md:flex items-center justify-end gap-2 text-[9px] font-mono text-outline-variant/50">
196
- <span>SESSION: IDLE</span>
197
- </div>
198
- </footer>
199
- </div>
200
- );
201
- };
202
-
203
- export default Layout;
 
1
+ import React, { useState, useRef, useEffect } from 'react';
2
+ import { config } from '../config';
3
+ import TopNavBar from './TopNavBar';
4
+ import SideNavBar from './SideNavBar';
5
+
6
+ /* ─── Terminal Panel ─── */
7
+ const COMMANDS = {
8
+ help: () => ['Commands: help | status | clear | echo <text>'],
9
+ status: () => ['Agent A (INV-01): STANDBY', 'Agent B (VAL-01): STANDBY', `WebSocket: ${config.WS_URL} — CONNECTED`, 'Episode: None active'],
10
+ };
11
+
12
+ const TerminalDrawer = ({ onClose }) => {
13
+ const [input, setInput] = useState('');
14
+ const [lines, setLines] = useState([{ type: 'system', text: '// NEXUS Terminal v2.0 — type "help" for commands' }]);
15
+ const [history, setHistory] = useState([]);
16
+ const [histIdx, setHistIdx] = useState(-1);
17
+ const endRef = useRef(null);
18
+ const inputRef = useRef(null);
19
+
20
+ useEffect(() => { endRef.current?.scrollIntoView({ behavior: 'smooth' }); }, [lines]);
21
+ useEffect(() => { inputRef.current?.focus(); }, []);
22
+
23
+ const run = (e) => {
24
+ e.preventDefault();
25
+ const cmd = input.trim();
26
+ if (!cmd) return;
27
+ setHistory(h => [cmd, ...h].slice(0, 50));
28
+ setHistIdx(-1);
29
+ if (cmd.toLowerCase() === 'clear') { setLines([]); setInput(''); return; }
30
+ const parts = cmd.toLowerCase().split(' ');
31
+ let output, type;
32
+ if (parts[0] === 'echo') { output = [cmd.slice(5) || '']; type = 'output'; }
33
+ else if (COMMANDS[parts[0]]) { output = COMMANDS[parts[0]](); type = 'output'; }
34
+ else { output = [`Command not found: ${parts[0]}. Type "help".`]; type = 'error'; }
35
+ setLines(l => [...l, { type: 'input', text: `nexus@terminal:~$ ${cmd}` }, ...output.map(t => ({ type, text: t }))]);
36
+ setInput('');
37
+ };
38
+
39
+ const handleKey = (e) => {
40
+ if (e.key === 'ArrowUp') { const i = Math.min(histIdx + 1, history.length - 1); setHistIdx(i); setInput(history[i] ?? ''); e.preventDefault(); }
41
+ if (e.key === 'ArrowDown') { const i = Math.max(histIdx - 1, -1); setHistIdx(i); setInput(i === -1 ? '' : history[i]); e.preventDefault(); }
42
+ };
43
+
44
+ const colorMap = { system: 'text-slate-600 italic', input: 'text-primary', output: 'text-on-surface/80', error: 'text-error' };
45
+
46
+ return (
47
+ <div className="flex flex-col h-full" onClick={() => inputRef.current?.focus()}>
48
+ <div className="flex-1 p-3 font-mono text-xs overflow-y-auto space-y-0.5 bg-surface-container-lowest cursor-text">
49
+ {lines.map((l, i) => <div key={i} className={colorMap[l.type]}>{l.text}</div>)}
50
+ <div ref={endRef} />
51
+ </div>
52
+ <form onSubmit={run} className="flex items-center gap-2 px-3 py-2 border-t border-white/5 bg-surface-container-lowest shrink-0">
53
+ <span className="text-primary font-mono text-xs shrink-0">nexus@terminal:~$</span>
54
+ <input ref={inputRef} value={input} onChange={e => setInput(e.target.value)} onKeyDown={handleKey}
55
+ className="flex-1 bg-transparent font-mono text-xs text-on-surface focus:outline-none placeholder:text-slate-700"
56
+ placeholder="type a command and press Enter..." />
57
+ </form>
58
+ </div>
59
+ );
60
+ };
61
+
62
+ /* ─── Communication Panel ─── */
63
+ const CommunicationDrawer = () => (
64
+ <div className="flex flex-col h-full p-4 font-mono text-xs space-y-2 bg-surface-container-lowest overflow-y-auto">
65
+ {[
66
+ { agent: 'AGENT_A', msg: 'Awaiting objective. Standing by for episode_start event.', time: '—', color: 'text-primary' },
67
+ { agent: 'AGENT_B', msg: 'Validation module idle. Ready to receive investigator output.', time: '—', color: 'text-secondary' },
68
+ { agent: 'SYSTEM', msg: 'No active episode. Use START to begin.', time: '—', color: 'text-outline-variant' },
69
+ ].map((m, i) => (
70
+ <div key={i} className="flex gap-3 py-1.5 border-b border-white/5">
71
+ <span className={`${m.color} font-bold shrink-0 w-20`}>[{m.agent}]</span>
72
+ <span className="text-on-surface/70">{m.msg}</span>
73
+ <span className="text-slate-600 ml-auto shrink-0">{m.time}</span>
74
+ </div>
75
+ ))}
76
+ </div>
77
+ );
78
+
79
+ /* ─── Reward Analytics Panel ─── */
80
+ const AnalyticsDrawer = () => {
81
+ const stats = [
82
+ { label: 'Avg Reward', value: '—', color: 'text-primary' },
83
+ { label: 'Best Step', value: '—', color: 'text-tertiary' },
84
+ { label: 'Root Cause', value: '—', color: 'text-tertiary' },
85
+ { label: 'Steps Run', value: '—', color: 'text-on-surface' },
86
+ { label: 'Episodes', value: '—', color: 'text-on-surface' },
87
+ { label: 'Success Rate', value: '—', color: 'text-secondary' },
88
+ ];
89
+ return (
90
+ <div className="flex h-full">
91
+ {/* Reward chart placeholder */}
92
+ <div className="flex-1 p-4 flex flex-col">
93
+ <p className="text-[9px] font-mono text-outline-variant uppercase mb-2">Cumulative Reward Over Steps</p>
94
+ <div className="flex-1 flex items-end gap-1 border-l border-b border-outline-variant/20 px-2 pb-1">
95
+ {[12, 24, 18, 36, 30, 48, 42, 60].map((h, i) => (
96
+ <div key={i} className="flex-1 flex flex-col items-center justify-end">
97
+ <div className="w-full bg-primary/30 rounded-sm transition-all" style={{ height: `${h}%` }}></div>
98
+ </div>
99
+ ))}
100
+ </div>
101
+ <p className="text-[9px] font-mono text-outline-variant/40 italic mt-1">No live data — connect to episode to populate</p>
102
+ </div>
103
+ {/* Stat grid */}
104
+ <div className="w-48 shrink-0 p-3 border-l border-white/5 grid grid-cols-2 gap-2 content-start">
105
+ {stats.map(s => (
106
+ <div key={s.label} className="bg-surface-container p-2 rounded border border-white/5">
107
+ <span className="text-[8px] font-mono text-outline-variant block uppercase truncate">{s.label}</span>
108
+ <span className={`text-sm font-bold font-mono ${s.color}`}>{s.value}</span>
109
+ </div>
110
+ ))}
111
+ </div>
112
+ </div>
113
+ );
114
+ };
115
+
116
+ /* ─── Layout ─── */
117
+ const TABS = [
118
+ { id: 'communication', label: 'Communication', icon: 'forum' },
119
+ { id: 'terminal', label: 'Terminal', icon: 'code' },
120
+ ];
121
+
122
+ const Layout = ({ children }) => {
123
+ const [activeTab, setActiveTab] = useState(null); // null = closed
124
+
125
+ const toggle = (id) => setActiveTab(prev => prev === id ? null : id);
126
+
127
+ /* drawer height when open */
128
+ const drawerH = 'h-64';
129
+
130
+ return (
131
+ <div className="min-h-screen flex flex-col">
132
+ <TopNavBar />
133
+ <SideNavBar />
134
+
135
+ {/* Main scrollable area — leave room for fixed footer + optional drawer */}
136
+ <main className={`ml-20 pt-16 flex-1 transition-all ${activeTab ? 'pb-[calc(48px+256px)]' : 'pb-12'}`}>
137
+ <div className="p-8 max-w-[1600px] mx-auto">
138
+ {children}
139
+ </div>
140
+ </main>
141
+
142
+ {/* Sliding drawer */}
143
+ {activeTab && (
144
+ <div className={`fixed bottom-12 left-20 right-0 ${drawerH} z-40 bg-surface border-t border-primary/20 shadow-[0_-10px_40px_rgba(0,0,0,0.6)] flex flex-col`}>
145
+ {/* Drawer title bar */}
146
+ <div className="flex items-center justify-between px-5 py-2 bg-surface-container border-b border-white/5 shrink-0">
147
+ <div className="flex items-center gap-2">
148
+ <span className="material-symbols-outlined text-primary text-sm">
149
+ {TABS.find(t => t.id === activeTab)?.icon}
150
+ </span>
151
+ <span className="font-mono text-xs text-primary uppercase tracking-widest">
152
+ {TABS.find(t => t.id === activeTab)?.label}
153
+ </span>
154
+ </div>
155
+ <button onClick={() => setActiveTab(null)} className="text-slate-500 hover:text-white transition-colors">
156
+ <span className="material-symbols-outlined text-sm">keyboard_arrow_down</span>
157
+ </button>
158
+ </div>
159
+ {/* Drawer content */}
160
+ <div className="flex-1 overflow-hidden">
161
+ {activeTab === 'terminal' && <TerminalDrawer onClose={() => setActiveTab(null)} />}
162
+ {activeTab === 'communication' && <CommunicationDrawer />}
163
+ {activeTab === 'analytics' && <AnalyticsDrawer />}
164
+ </div>
165
+ </div>
166
+ )}
167
+
168
+ {/* Footer tab bar */}
169
+ <footer className="fixed bottom-0 left-0 w-full h-12 bg-background/90 backdrop-blur-2xl z-50 flex items-center border-t border-primary/15 px-8 shadow-[0_-10px_30px_rgba(0,0,0,0.5)]">
170
+ {/* Left: ticker */}
171
+ <div className="flex-1 hidden md:flex items-center gap-2 overflow-hidden">
172
+ <span className="text-[9px] font-mono text-outline-variant italic uppercase tracking-tight whitespace-nowrap">
173
+ SYSTEM_INITIALIZED: STANDBY FOR AGENT HANDSHAKE...
174
+ </span>
175
+ </div>
176
+
177
+ {/* Centre: tabs */}
178
+ <div className="flex items-center gap-1 shrink-0">
179
+ {TABS.map(tab => (
180
+ <button
181
+ key={tab.id}
182
+ onClick={() => toggle(tab.id)}
183
+ className={`flex items-center gap-2 px-4 h-12 transition-all border-t-2 font-mono text-[10px] tracking-widest uppercase ${activeTab === tab.id
184
+ ? 'border-primary text-primary bg-primary/10'
185
+ : 'border-transparent text-slate-500 hover:text-primary hover:bg-white/5'
186
+ }`}
187
+ >
188
+ <span className="material-symbols-outlined text-base">{tab.icon}</span>
189
+ {tab.label}
190
+ </button>
191
+ ))}
192
+ </div>
193
+
194
+ {/* Right: session info */}
195
+ <div className="flex-1 hidden md:flex items-center justify-end gap-2 text-[9px] font-mono text-outline-variant/50">
196
+ <span>SESSION: IDLE</span>
197
+ </div>
198
+ </footer>
199
+ </div>
200
+ );
201
+ };
202
+
203
+ export default Layout;
frontend/src/components/SideNavBar.jsx CHANGED
@@ -1,154 +1,154 @@
1
- import React, { useState } from 'react';
2
- import { Link, useLocation } from 'react-router-dom';
3
-
4
- const StatusPanel = ({ onClose }) => (
5
- <div className="fixed left-20 bottom-0 z-50 w-80 bg-surface border border-primary/20 shadow-2xl rounded-tr-xl overflow-hidden">
6
- <div className="flex items-center justify-between px-4 py-3 bg-surface-container border-b border-white/5">
7
- <div className="flex items-center gap-2">
8
- <span className="material-symbols-outlined text-primary text-sm">online_prediction</span>
9
- <span className="font-mono text-xs text-primary uppercase tracking-widest">System Status</span>
10
- </div>
11
- <button onClick={onClose} className="text-slate-500 hover:text-white transition-colors">
12
- <span className="material-symbols-outlined text-sm">close</span>
13
- </button>
14
- </div>
15
- <div className="p-4 space-y-3 font-mono text-xs">
16
- {[
17
- { label: 'Agent A (INV-01)', status: 'STANDBY', color: 'text-tertiary' },
18
- { label: 'Agent B (VAL-01)', status: 'STANDBY', color: 'text-tertiary' },
19
- { label: 'WebSocket', status: 'CONNECTED', color: 'text-tertiary' },
20
- { label: 'Ollama API', status: 'CHECKING...', color: 'text-secondary' },
21
- { label: 'NEXUS Core', status: 'ONLINE', color: 'text-tertiary' },
22
- ].map(({ label, status, color }) => (
23
- <div key={label} className="flex justify-between items-center py-1 border-b border-white/5">
24
- <span className="text-slate-400 uppercase tracking-wider">{label}</span>
25
- <span className={`${color} font-bold flex items-center gap-1`}>
26
- <span className={`w-1.5 h-1.5 rounded-full ${color.replace('text', 'bg')} animate-pulse`}></span>
27
- {status}
28
- </span>
29
- </div>
30
- ))}
31
- </div>
32
- </div>
33
- );
34
-
35
- const LogsPanel = ({ onClose }) => {
36
- const [logs] = useState([
37
- { time: '13:45:01', level: 'INFO', msg: 'NEXUS Core initialized' },
38
- { time: '13:45:01', level: 'INFO', msg: 'WebSocket server listening on :7860' },
39
- { time: '13:45:02', level: 'INFO', msg: 'Agent A ready — NEXUS-CORE-INV-01' },
40
- { time: '13:45:02', level: 'INFO', msg: 'Agent B ready — NEXUS-CORE-VAL-01' },
41
- { time: '13:45:05', level: 'WARN', msg: 'No active episode. Awaiting start command.' },
42
- ]);
43
- const levelColor = { INFO: 'text-tertiary', WARN: 'text-secondary', ERROR: 'text-error' };
44
-
45
- return (
46
- <div className="fixed left-20 bottom-0 z-50 w-96 bg-surface border border-primary/20 shadow-2xl rounded-tr-xl overflow-hidden">
47
- <div className="flex items-center justify-between px-4 py-3 bg-surface-container border-b border-white/5">
48
- <div className="flex items-center gap-2">
49
- <span className="material-symbols-outlined text-primary text-sm">terminal</span>
50
- <span className="font-mono text-xs text-primary uppercase tracking-widest">System Logs</span>
51
- </div>
52
- <button onClick={onClose} className="text-slate-500 hover:text-white transition-colors">
53
- <span className="material-symbols-outlined text-sm">close</span>
54
- </button>
55
- </div>
56
- <div className="p-3 bg-surface-container-lowest h-48 overflow-y-auto space-y-1 font-mono text-[10px]">
57
- {logs.map((l, i) => (
58
- <div key={i} className="flex gap-2">
59
- <span className="text-slate-600 shrink-0">{l.time}</span>
60
- <span className={`shrink-0 font-bold w-10 ${levelColor[l.level]}`}>{l.level}</span>
61
- <span className="text-on-surface/70">{l.msg}</span>
62
- </div>
63
- ))}
64
- </div>
65
- </div>
66
- );
67
- };
68
-
69
- const SideNavBar = () => {
70
- const location = useLocation();
71
- const [activePanel, setActivePanel] = useState(null); // 'status' | 'logs' | null
72
-
73
- const navLinks = [
74
- { name: 'Dashboard', icon: 'dashboard', path: '/' },
75
- { name: 'Scenarios', icon: 'account_tree', path: '/scenarios' },
76
- { name: 'Settings', icon: 'settings', path: '/settings' },
77
- ];
78
-
79
- const togglePanel = (panel) => setActivePanel(p => p === panel ? null : panel);
80
-
81
- return (
82
- <>
83
- <aside className="fixed left-0 top-16 bottom-0 z-40 flex flex-col items-center py-8 bg-surface border-r border-primary/5 w-20 hover:w-64 transition-all duration-500 group">
84
- <div className="flex flex-col items-center group-hover:items-start group-hover:px-6 w-full space-y-8">
85
- {/* Operator Badge */}
86
- <div className="flex flex-col items-center group-hover:flex-row group-hover:gap-4 w-full px-2 transition-all">
87
- <div className="w-10 h-10 rounded bg-surface-container-highest flex items-center justify-center refractive-edge shrink-0">
88
- <span className="material-symbols-outlined text-primary">shield</span>
89
- </div>
90
- <div className="hidden group-hover:block transition-all">
91
- <p className="font-mono text-xs tracking-tight text-white font-bold whitespace-nowrap">OPERATOR_01</p>
92
- <p className="font-mono text-[10px] text-slate-500">ID: 9X-2244</p>
93
- </div>
94
- </div>
95
-
96
- {/* Nav Links */}
97
- <div className="flex flex-col w-full">
98
- {navLinks.map((link) => (
99
- <Link
100
- key={link.name}
101
- to={link.path}
102
- className={`flex items-center h-14 w-full transition-all ${location.pathname === link.path
103
- ? 'bg-gradient-to-r from-primary/20 to-transparent border-l-4 border-primary text-white'
104
- : 'text-slate-500 opacity-60 hover:opacity-100 hover:bg-surface-container-low'
105
- }`}
106
- >
107
- <div className="w-20 flex justify-center flex-shrink-0">
108
- <span className={`material-symbols-outlined ${location.pathname === link.path ? 'text-primary' : ''}`}>
109
- {link.icon}
110
- </span>
111
- </div>
112
- <span className="hidden group-hover:block font-mono text-xs tracking-tight uppercase">
113
- {link.name}
114
- </span>
115
- </Link>
116
- ))}
117
- </div>
118
- </div>
119
-
120
- {/* Bottom utility buttons */}
121
- <div className="mt-auto w-full group-hover:px-6">
122
- <div className="flex flex-col gap-2 items-center group-hover:items-start pb-4">
123
- <button
124
- onClick={() => togglePanel('status')}
125
- className={`flex items-center h-12 w-full transition-all rounded ${activePanel === 'status' ? 'text-primary bg-primary/10' : 'text-slate-500 opacity-60 hover:opacity-100 hover:bg-surface-container-low'
126
- }`}
127
- >
128
- <div className="w-20 flex justify-center flex-shrink-0">
129
- <span className="material-symbols-outlined text-sm">online_prediction</span>
130
- </div>
131
- <span className="hidden group-hover:block font-mono text-[10px] uppercase tracking-widest">Status</span>
132
- </button>
133
- <button
134
- onClick={() => togglePanel('logs')}
135
- className={`flex items-center h-12 w-full transition-all rounded ${activePanel === 'logs' ? 'text-primary bg-primary/10' : 'text-slate-500 opacity-60 hover:opacity-100 hover:bg-surface-container-low'
136
- }`}
137
- >
138
- <div className="w-20 flex justify-center flex-shrink-0">
139
- <span className="material-symbols-outlined text-sm">terminal</span>
140
- </div>
141
- <span className="hidden group-hover:block font-mono text-[10px] uppercase tracking-widest">Logs</span>
142
- </button>
143
- </div>
144
- </div>
145
- </aside>
146
-
147
- {/* Floating Panels */}
148
- {activePanel === 'status' && <StatusPanel onClose={() => setActivePanel(null)} />}
149
- {activePanel === 'logs' && <LogsPanel onClose={() => setActivePanel(null)} />}
150
- </>
151
- );
152
- };
153
-
154
- export default SideNavBar;
 
1
+ import React, { useState } from 'react';
2
+ import { Link, useLocation } from 'react-router-dom';
3
+
4
+ const StatusPanel = ({ onClose }) => (
5
+ <div className="fixed left-20 bottom-0 z-50 w-80 bg-surface border border-primary/20 shadow-2xl rounded-tr-xl overflow-hidden">
6
+ <div className="flex items-center justify-between px-4 py-3 bg-surface-container border-b border-white/5">
7
+ <div className="flex items-center gap-2">
8
+ <span className="material-symbols-outlined text-primary text-sm">online_prediction</span>
9
+ <span className="font-mono text-xs text-primary uppercase tracking-widest">System Status</span>
10
+ </div>
11
+ <button onClick={onClose} className="text-slate-500 hover:text-white transition-colors">
12
+ <span className="material-symbols-outlined text-sm">close</span>
13
+ </button>
14
+ </div>
15
+ <div className="p-4 space-y-3 font-mono text-xs">
16
+ {[
17
+ { label: 'Agent A (INV-01)', status: 'STANDBY', color: 'text-tertiary' },
18
+ { label: 'Agent B (VAL-01)', status: 'STANDBY', color: 'text-tertiary' },
19
+ { label: 'WebSocket', status: 'CONNECTED', color: 'text-tertiary' },
20
+ { label: 'Ollama API', status: 'CHECKING...', color: 'text-secondary' },
21
+ { label: 'NEXUS Core', status: 'ONLINE', color: 'text-tertiary' },
22
+ ].map(({ label, status, color }) => (
23
+ <div key={label} className="flex justify-between items-center py-1 border-b border-white/5">
24
+ <span className="text-slate-400 uppercase tracking-wider">{label}</span>
25
+ <span className={`${color} font-bold flex items-center gap-1`}>
26
+ <span className={`w-1.5 h-1.5 rounded-full ${color.replace('text', 'bg')} animate-pulse`}></span>
27
+ {status}
28
+ </span>
29
+ </div>
30
+ ))}
31
+ </div>
32
+ </div>
33
+ );
34
+
35
+ const LogsPanel = ({ onClose }) => {
36
+ const [logs] = useState([
37
+ { time: '13:45:01', level: 'INFO', msg: 'NEXUS Core initialized' },
38
+ { time: '13:45:01', level: 'INFO', msg: 'WebSocket server listening on :7860' },
39
+ { time: '13:45:02', level: 'INFO', msg: 'Agent A ready — NEXUS-CORE-INV-01' },
40
+ { time: '13:45:02', level: 'INFO', msg: 'Agent B ready — NEXUS-CORE-VAL-01' },
41
+ { time: '13:45:05', level: 'WARN', msg: 'No active episode. Awaiting start command.' },
42
+ ]);
43
+ const levelColor = { INFO: 'text-tertiary', WARN: 'text-secondary', ERROR: 'text-error' };
44
+
45
+ return (
46
+ <div className="fixed left-20 bottom-0 z-50 w-96 bg-surface border border-primary/20 shadow-2xl rounded-tr-xl overflow-hidden">
47
+ <div className="flex items-center justify-between px-4 py-3 bg-surface-container border-b border-white/5">
48
+ <div className="flex items-center gap-2">
49
+ <span className="material-symbols-outlined text-primary text-sm">terminal</span>
50
+ <span className="font-mono text-xs text-primary uppercase tracking-widest">System Logs</span>
51
+ </div>
52
+ <button onClick={onClose} className="text-slate-500 hover:text-white transition-colors">
53
+ <span className="material-symbols-outlined text-sm">close</span>
54
+ </button>
55
+ </div>
56
+ <div className="p-3 bg-surface-container-lowest h-48 overflow-y-auto space-y-1 font-mono text-[10px]">
57
+ {logs.map((l, i) => (
58
+ <div key={i} className="flex gap-2">
59
+ <span className="text-slate-600 shrink-0">{l.time}</span>
60
+ <span className={`shrink-0 font-bold w-10 ${levelColor[l.level]}`}>{l.level}</span>
61
+ <span className="text-on-surface/70">{l.msg}</span>
62
+ </div>
63
+ ))}
64
+ </div>
65
+ </div>
66
+ );
67
+ };
68
+
69
+ const SideNavBar = () => {
70
+ const location = useLocation();
71
+ const [activePanel, setActivePanel] = useState(null); // 'status' | 'logs' | null
72
+
73
+ const navLinks = [
74
+ { name: 'Dashboard', icon: 'dashboard', path: '/' },
75
+ { name: 'Scenarios', icon: 'account_tree', path: '/scenarios' },
76
+ { name: 'Settings', icon: 'settings', path: '/settings' },
77
+ ];
78
+
79
+ const togglePanel = (panel) => setActivePanel(p => p === panel ? null : panel);
80
+
81
+ return (
82
+ <>
83
+ <aside className="fixed left-0 top-16 bottom-0 z-40 flex flex-col items-center py-8 bg-surface border-r border-primary/5 w-20 hover:w-64 transition-all duration-500 group">
84
+ <div className="flex flex-col items-center group-hover:items-start group-hover:px-6 w-full space-y-8">
85
+ {/* Operator Badge */}
86
+ <div className="flex flex-col items-center group-hover:flex-row group-hover:gap-4 w-full px-2 transition-all">
87
+ <div className="w-10 h-10 rounded bg-surface-container-highest flex items-center justify-center refractive-edge shrink-0">
88
+ <span className="material-symbols-outlined text-primary">shield</span>
89
+ </div>
90
+ <div className="hidden group-hover:block transition-all">
91
+ <p className="font-mono text-xs tracking-tight text-white font-bold whitespace-nowrap">OPERATOR_01</p>
92
+ <p className="font-mono text-[10px] text-slate-500">ID: 9X-2244</p>
93
+ </div>
94
+ </div>
95
+
96
+ {/* Nav Links */}
97
+ <div className="flex flex-col w-full">
98
+ {navLinks.map((link) => (
99
+ <Link
100
+ key={link.name}
101
+ to={link.path}
102
+ className={`flex items-center h-14 w-full transition-all ${location.pathname === link.path
103
+ ? 'bg-gradient-to-r from-primary/20 to-transparent border-l-4 border-primary text-white'
104
+ : 'text-slate-500 opacity-60 hover:opacity-100 hover:bg-surface-container-low'
105
+ }`}
106
+ >
107
+ <div className="w-20 flex justify-center flex-shrink-0">
108
+ <span className={`material-symbols-outlined ${location.pathname === link.path ? 'text-primary' : ''}`}>
109
+ {link.icon}
110
+ </span>
111
+ </div>
112
+ <span className="hidden group-hover:block font-mono text-xs tracking-tight uppercase">
113
+ {link.name}
114
+ </span>
115
+ </Link>
116
+ ))}
117
+ </div>
118
+ </div>
119
+
120
+ {/* Bottom utility buttons */}
121
+ <div className="mt-auto w-full group-hover:px-6">
122
+ <div className="flex flex-col gap-2 items-center group-hover:items-start pb-4">
123
+ <button
124
+ onClick={() => togglePanel('status')}
125
+ className={`flex items-center h-12 w-full transition-all rounded ${activePanel === 'status' ? 'text-primary bg-primary/10' : 'text-slate-500 opacity-60 hover:opacity-100 hover:bg-surface-container-low'
126
+ }`}
127
+ >
128
+ <div className="w-20 flex justify-center flex-shrink-0">
129
+ <span className="material-symbols-outlined text-sm">online_prediction</span>
130
+ </div>
131
+ <span className="hidden group-hover:block font-mono text-[10px] uppercase tracking-widest">Status</span>
132
+ </button>
133
+ <button
134
+ onClick={() => togglePanel('logs')}
135
+ className={`flex items-center h-12 w-full transition-all rounded ${activePanel === 'logs' ? 'text-primary bg-primary/10' : 'text-slate-500 opacity-60 hover:opacity-100 hover:bg-surface-container-low'
136
+ }`}
137
+ >
138
+ <div className="w-20 flex justify-center flex-shrink-0">
139
+ <span className="material-symbols-outlined text-sm">terminal</span>
140
+ </div>
141
+ <span className="hidden group-hover:block font-mono text-[10px] uppercase tracking-widest">Logs</span>
142
+ </button>
143
+ </div>
144
+ </div>
145
+ </aside>
146
+
147
+ {/* Floating Panels */}
148
+ {activePanel === 'status' && <StatusPanel onClose={() => setActivePanel(null)} />}
149
+ {activePanel === 'logs' && <LogsPanel onClose={() => setActivePanel(null)} />}
150
+ </>
151
+ );
152
+ };
153
+
154
+ export default SideNavBar;
frontend/src/components/TopNavBar.jsx CHANGED
@@ -1,81 +1,81 @@
1
- import React from 'react';
2
- import { useApp } from '../context/AppContext';
3
-
4
- const TopNavBar = () => {
5
- const { sessionData, isConnected, sendCommand } = useApp();
6
-
7
- const status = sessionData?.status || 'STANDBY';
8
- const isRunning = sessionData?.active && status !== 'COMPLETED';
9
-
10
- const isStandby = status === 'STANDBY' || status === 'READY';
11
-
12
- return (
13
- <header className="fixed top-0 w-full z-50 flex justify-between items-center px-6 h-16 bg-surface/60 backdrop-blur-xl border-b border-primary/10 shadow-[0_0_40px_rgba(0,212,255,0.04)]">
14
- <div className="flex items-center gap-8">
15
- <span className="text-2xl font-black tracking-tighter text-primary font-headline">NEXUS</span>
16
- <div className="h-8 w-px bg-outline-variant/20 hidden md:block"></div>
17
- <div className="hidden md:flex flex-col">
18
- <span className="text-[10px] font-mono text-outline-variant tracking-widest uppercase">System Status</span>
19
- <div className="text-sm font-mono text-tertiary">{status}</div>
20
- </div>
21
- </div>
22
-
23
- <div className="flex items-center gap-6">
24
- <div className="flex gap-2">
25
- {/* START - clickable when standby */}
26
- <button
27
- onClick={() => sendCommand({ action: 'start' })}
28
- disabled={isRunning}
29
- className={`flex items-center gap-2 px-4 py-1.5 rounded-full border text-xs font-bold transition-all ${isRunning
30
- ? 'bg-surface-container text-slate-600 border-slate-700 cursor-not-allowed'
31
- : 'bg-tertiary/10 border-tertiary/20 text-tertiary hover:bg-tertiary/20 active:scale-95'}`}
32
- >
33
- <span className="material-symbols-outlined text-sm">play_arrow</span> START
34
- </button>
35
-
36
- {/* PAUSE/RESUME - clickable when running */}
37
- <button
38
- onClick={() => sendCommand({ action: 'pause' })}
39
- disabled={!isRunning}
40
- className={`flex items-center gap-2 px-4 py-1.5 rounded-full border text-xs font-bold transition-all ${!isRunning
41
- ? 'bg-surface-container text-slate-600 border-slate-700 cursor-not-allowed'
42
- : status === 'PAUSED'
43
- ? 'bg-secondary text-surface border-secondary active:scale-95'
44
- : 'bg-secondary/10 border-secondary/20 text-secondary hover:bg-secondary/20 active:scale-95'}`}
45
- >
46
- <span className="material-symbols-outlined text-sm">{status === 'PAUSED' ? 'play_arrow' : 'pause'}</span>
47
- {status === 'PAUSED' ? 'RESUME' : 'PAUSE'}
48
- </button>
49
-
50
- {/* FORCE END - clickable when running */}
51
- <button
52
- onClick={() => sendCommand({ action: 'force_end' })}
53
- disabled={!isRunning}
54
- className={`flex items-center gap-2 px-4 py-1.5 rounded-full border text-xs font-bold transition-all ${!isRunning
55
- ? 'bg-surface-container text-slate-600 border-slate-700 cursor-not-allowed'
56
- : 'bg-[#f59e0b]/10 border-[#f59e0b]/20 text-[#f59e0b] hover:bg-[#f59e0b]/20 active:scale-95'}`}
57
- >
58
- <span className="material-symbols-outlined text-sm">stop_circle</span> FORCE END
59
- </button>
60
-
61
- {/* RESET - always clickable */}
62
- <button
63
- onClick={() => sendCommand({ action: 'reset' })}
64
- className="flex items-center gap-2 px-4 py-1.5 rounded-full bg-error/10 border border-error/20 text-error text-xs font-bold hover:bg-error/20 transition-all active:scale-95"
65
- >
66
- <span className="material-symbols-outlined text-sm">restart_alt</span> RESET
67
- </button>
68
- </div>
69
-
70
- <div className="flex items-center gap-2 ml-4">
71
- <div className={`w-2 h-2 rounded-full animate-pulse shadow-[0_0_8px_#66fa8c] ${isConnected ? 'bg-tertiary' : 'bg-error'}`}></div>
72
- <span className={`text-[10px] font-mono font-bold tracking-widest uppercase ${isConnected ? 'text-tertiary' : 'text-error'}`}>
73
- {isConnected ? 'CONNECTED' : 'DISCONNECTED'}
74
- </span>
75
- </div>
76
- </div>
77
- </header>
78
- );
79
- };
80
-
81
- export default TopNavBar;
 
1
+ import React from 'react';
2
+ import { useApp } from '../context/AppContext';
3
+
4
+ const TopNavBar = () => {
5
+ const { sessionData, isConnected, sendCommand } = useApp();
6
+
7
+ const status = sessionData?.status || 'STANDBY';
8
+ const isRunning = sessionData?.active && status !== 'COMPLETED';
9
+
10
+ const isStandby = status === 'STANDBY' || status === 'READY';
11
+
12
+ return (
13
+ <header className="fixed top-0 w-full z-50 flex justify-between items-center px-6 h-16 bg-surface/60 backdrop-blur-xl border-b border-primary/10 shadow-[0_0_40px_rgba(0,212,255,0.04)]">
14
+ <div className="flex items-center gap-8">
15
+ <span className="text-2xl font-black tracking-tighter text-primary font-headline">NEXUS</span>
16
+ <div className="h-8 w-px bg-outline-variant/20 hidden md:block"></div>
17
+ <div className="hidden md:flex flex-col">
18
+ <span className="text-[10px] font-mono text-outline-variant tracking-widest uppercase">System Status</span>
19
+ <div className="text-sm font-mono text-tertiary">{status}</div>
20
+ </div>
21
+ </div>
22
+
23
+ <div className="flex items-center gap-6">
24
+ <div className="flex gap-2">
25
+ {/* START - clickable when standby */}
26
+ <button
27
+ onClick={() => sendCommand({ action: 'start' })}
28
+ disabled={isRunning}
29
+ className={`flex items-center gap-2 px-4 py-1.5 rounded-full border text-xs font-bold transition-all ${isRunning
30
+ ? 'bg-surface-container text-slate-600 border-slate-700 cursor-not-allowed'
31
+ : 'bg-tertiary/10 border-tertiary/20 text-tertiary hover:bg-tertiary/20 active:scale-95'}`}
32
+ >
33
+ <span className="material-symbols-outlined text-sm">play_arrow</span> START
34
+ </button>
35
+
36
+ {/* PAUSE/RESUME - clickable when running */}
37
+ <button
38
+ onClick={() => sendCommand({ action: 'pause' })}
39
+ disabled={!isRunning}
40
+ className={`flex items-center gap-2 px-4 py-1.5 rounded-full border text-xs font-bold transition-all ${!isRunning
41
+ ? 'bg-surface-container text-slate-600 border-slate-700 cursor-not-allowed'
42
+ : status === 'PAUSED'
43
+ ? 'bg-secondary text-surface border-secondary active:scale-95'
44
+ : 'bg-secondary/10 border-secondary/20 text-secondary hover:bg-secondary/20 active:scale-95'}`}
45
+ >
46
+ <span className="material-symbols-outlined text-sm">{status === 'PAUSED' ? 'play_arrow' : 'pause'}</span>
47
+ {status === 'PAUSED' ? 'RESUME' : 'PAUSE'}
48
+ </button>
49
+
50
+ {/* FORCE END - clickable when running */}
51
+ <button
52
+ onClick={() => sendCommand({ action: 'force_end' })}
53
+ disabled={!isRunning}
54
+ className={`flex items-center gap-2 px-4 py-1.5 rounded-full border text-xs font-bold transition-all ${!isRunning
55
+ ? 'bg-surface-container text-slate-600 border-slate-700 cursor-not-allowed'
56
+ : 'bg-[#f59e0b]/10 border-[#f59e0b]/20 text-[#f59e0b] hover:bg-[#f59e0b]/20 active:scale-95'}`}
57
+ >
58
+ <span className="material-symbols-outlined text-sm">stop_circle</span> FORCE END
59
+ </button>
60
+
61
+ {/* RESET - always clickable */}
62
+ <button
63
+ onClick={() => sendCommand({ action: 'reset' })}
64
+ className="flex items-center gap-2 px-4 py-1.5 rounded-full bg-error/10 border border-error/20 text-error text-xs font-bold hover:bg-error/20 transition-all active:scale-95"
65
+ >
66
+ <span className="material-symbols-outlined text-sm">restart_alt</span> RESET
67
+ </button>
68
+ </div>
69
+
70
+ <div className="flex items-center gap-2 ml-4">
71
+ <div className={`w-2 h-2 rounded-full animate-pulse shadow-[0_0_8px_#66fa8c] ${isConnected ? 'bg-tertiary' : 'bg-error'}`}></div>
72
+ <span className={`text-[10px] font-mono font-bold tracking-widest uppercase ${isConnected ? 'text-tertiary' : 'text-error'}`}>
73
+ {isConnected ? 'CONNECTED' : 'DISCONNECTED'}
74
+ </span>
75
+ </div>
76
+ </div>
77
+ </header>
78
+ );
79
+ };
80
+
81
+ export default TopNavBar;
frontend/src/context/AppContext.jsx CHANGED
@@ -1,48 +1,48 @@
1
- import React, { createContext, useContext, useState, useEffect, useMemo } from 'react';
2
- import { config } from '../config';
3
- import useWebSocket from '../hooks/useWebSocket';
4
- const AppContext = createContext();
5
-
6
- export const AppProvider = ({ children }) => {
7
- const [globalMaxSteps, setGlobalMaxSteps] = useState(30);
8
- const [simulationSeconds, setSimulationSeconds] = useState(0);
9
- const { gameState, isConnected, sendCommand } = useWebSocket(config.WS_URL);
10
-
11
- useEffect(() => {
12
- const status = gameState?.status;
13
- if (status === 'STANDBY' || status === 'COMPLETED') {
14
- setSimulationSeconds(0);
15
- return;
16
- }
17
- if (status === 'PAUSED') {
18
- return;
19
- }
20
- const interval = setInterval(() => {
21
- setSimulationSeconds(s => s + 1);
22
- }, 1000);
23
- return () => clearInterval(interval);
24
- }, [gameState?.status]);
25
-
26
- const value = useMemo(() => ({
27
- sessionData: gameState,
28
- isConnected,
29
- sendCommand,
30
- globalMaxSteps,
31
- setGlobalMaxSteps,
32
- simulationSeconds
33
- }), [gameState, isConnected, sendCommand, globalMaxSteps, simulationSeconds]);
34
-
35
- return (
36
- <AppContext.Provider value={value}>
37
- {children}
38
- </AppContext.Provider>
39
- );
40
- };
41
-
42
- export const useApp = () => {
43
- const context = useContext(AppContext);
44
- if (!context) {
45
- throw new Error('useApp must be used within an AppProvider');
46
- }
47
- return context;
48
- };
 
1
+ import React, { createContext, useContext, useState, useEffect, useMemo } from 'react';
2
+ import { config } from '../config';
3
+ import useWebSocket from '../hooks/useWebSocket';
4
+ const AppContext = createContext();
5
+
6
+ export const AppProvider = ({ children }) => {
7
+ const [globalMaxSteps, setGlobalMaxSteps] = useState(30);
8
+ const [simulationSeconds, setSimulationSeconds] = useState(0);
9
+ const { gameState, isConnected, sendCommand } = useWebSocket(config.WS_URL);
10
+
11
+ useEffect(() => {
12
+ const status = gameState?.status;
13
+ if (status === 'STANDBY' || status === 'COMPLETED') {
14
+ setSimulationSeconds(0);
15
+ return;
16
+ }
17
+ if (status === 'PAUSED') {
18
+ return;
19
+ }
20
+ const interval = setInterval(() => {
21
+ setSimulationSeconds(s => s + 1);
22
+ }, 1000);
23
+ return () => clearInterval(interval);
24
+ }, [gameState?.status]);
25
+
26
+ const value = useMemo(() => ({
27
+ sessionData: gameState,
28
+ isConnected,
29
+ sendCommand,
30
+ globalMaxSteps,
31
+ setGlobalMaxSteps,
32
+ simulationSeconds
33
+ }), [gameState, isConnected, sendCommand, globalMaxSteps, simulationSeconds]);
34
+
35
+ return (
36
+ <AppContext.Provider value={value}>
37
+ {children}
38
+ </AppContext.Provider>
39
+ );
40
+ };
41
+
42
+ export const useApp = () => {
43
+ const context = useContext(AppContext);
44
+ if (!context) {
45
+ throw new Error('useApp must be used within an AppProvider');
46
+ }
47
+ return context;
48
+ };
frontend/src/hooks/useWebSocket.js CHANGED
@@ -1,214 +1,214 @@
1
- import { useState, useEffect, useCallback, useRef } from 'react';
2
-
3
- const useWebSocket = (url) => {
4
- const [events, setEvents] = useState([]);
5
- const [gameState, setGameState] = useState({
6
- scenario: null,
7
- active: false,
8
- status: 'AWAITING_OBJECTIVE',
9
- step: 0,
10
- reward: 0,
11
- cumulativeReward: 0,
12
- agent_a_model: '',
13
- agent_b_model: '',
14
- agents: {
15
- agent_a: { status: 'STANDBY', messages: [] },
16
- agent_b: { status: 'STANDBY', messages: [] }
17
- },
18
- clues_found: [],
19
- rewardBreakdown: {},
20
- rewardHistory: []
21
- });
22
-
23
- const [isConnected, setIsConnected] = useState(false);
24
- const [error, setError] = useState(null);
25
- const socketRef = useRef(null);
26
-
27
- useEffect(() => {
28
- socketRef.current = new WebSocket(url);
29
-
30
- socketRef.current.onopen = () => setIsConnected(true);
31
-
32
- socketRef.current.onmessage = (event) => {
33
- const data = JSON.parse(event.data);
34
- setEvents(prev => [...prev, data]);
35
-
36
- setGameState(prev => {
37
- let current = { ...prev };
38
-
39
- if (data.type === 'episode_start') {
40
- return {
41
- ...current,
42
- scenario: data.scenario,
43
- active: true,
44
- status: 'INVESTIGATING',
45
- step: 0,
46
- reward: 0,
47
- cumulativeReward: 0,
48
- clues_found: [],
49
- agent_a_model: data.agent_a_model || current.agent_a_model,
50
- agent_b_model: data.agent_b_model || current.agent_b_model,
51
- agents: {
52
- agent_a: { status: 'ACTIVE', messages: [] },
53
- agent_b: { status: 'ACTIVE', messages: [] }
54
- }
55
- };
56
- }
57
-
58
- const newState = { ...current };
59
-
60
- if (data.step !== undefined) {
61
- newState.step = data.step;
62
- }
63
-
64
- if (data.type === 'agent_partial') {
65
- const agentId = data.agent_id;
66
- const agents = { ...newState.agents };
67
- const agentReference = agents[agentId];
68
- if (agentReference) {
69
- const agent = { ...agentReference };
70
- const messages = [...(agent.messages || [])];
71
- const lastMsg = messages[messages.length - 1];
72
- if (lastMsg && lastMsg.type === 'message' && lastMsg.partial) {
73
- messages[messages.length - 1] = { ...lastMsg, content: data.full_message };
74
- } else {
75
- messages.push({
76
- type: 'message',
77
- content: data.full_message,
78
- partial: true
79
- });
80
- }
81
- agent.messages = messages;
82
- agents[agentId] = agent;
83
- newState.agents = agents;
84
- }
85
- }
86
-
87
- if (data.type === 'agent_message') {
88
- const agentId = data.agent_id;
89
- const agents = { ...newState.agents };
90
- const agentReference = agents[agentId];
91
- if (agentReference) {
92
- const agent = { ...agentReference };
93
- const messages = [...(agent.messages || [])];
94
- const lastMsg = messages[messages.length - 1];
95
- if (lastMsg && lastMsg.partial) {
96
- messages[messages.length - 1] = { ...lastMsg, content: data.message, partial: undefined };
97
- } else {
98
- messages.push({
99
- type: 'message',
100
- content: data.message
101
- });
102
- }
103
- agent.messages = messages;
104
- agents[agentId] = agent;
105
- newState.agents = agents;
106
- }
107
- }
108
-
109
- if (data.status === 'READY') {
110
- newState.status = 'READY_TO_INJECT';
111
- newState.active = false;
112
- newState.agents = {
113
- agent_a: { ...newState.agents.agent_a, messages: [] },
114
- agent_b: { ...newState.agents.agent_b, messages: [] }
115
- };
116
- }
117
-
118
- if (data.type === 'system_status') {
119
- if (data.paused !== undefined) {
120
- newState.status = data.paused ? 'PAUSED' : 'INVESTIGATING';
121
- }
122
- if (data.status) {
123
- newState.status = data.status;
124
- }
125
- if (data.active !== undefined) {
126
- newState.active = data.active;
127
- }
128
- }
129
-
130
- if (data.type === 'tool_call') {
131
- const agentId = data.agent_id;
132
- const agents = { ...newState.agents };
133
- if (agents[agentId]) {
134
- const agent = { ...agents[agentId], messages: [...(agents[agentId].messages || [])] };
135
- agent.messages.push({
136
- type: 'tool_call',
137
- tool_name: data.tool_name,
138
- params: data.params
139
- });
140
- agents[agentId] = agent;
141
- newState.agents = agents;
142
- }
143
- }
144
-
145
- if (data.type === 'tool_result') {
146
- const agents = { ...newState.agents };
147
- const agentAReference = agents.agent_a;
148
- if (agentAReference) {
149
- const agentA = { ...agentAReference, messages: [...(agentAReference.messages || [])] };
150
- agentA.messages.push({
151
- type: 'tool_result',
152
- tool_name: data.tool_name,
153
- result: data.result,
154
- success: data.success
155
- });
156
- agents.agent_a = agentA;
157
- newState.agents = agents;
158
- }
159
-
160
- // Simple heuristic for clues if not sent explicitly
161
- const res = data.result?.toLowerCase() || '';
162
- if (res.includes('error') || res.includes('anomaly') || res.includes('warning') || res.includes('degraded') || data.tool_name === 'propose_fix') {
163
- const currentClues = newState.clues_found || [];
164
- if (!currentClues.includes(data.result)) {
165
- newState.clues_found = [...currentClues, data.result];
166
- }
167
- }
168
- }
169
-
170
- if (data.type === 'reward_update') {
171
- newState.reward = data.reward;
172
- newState.cumulativeReward = data.cumulative;
173
- newState.rewardBreakdown = data.breakdown || {};
174
- newState.rewardHistory = [...(newState.rewardHistory || []), data.reward];
175
- }
176
-
177
- if (data.type === 'episode_end') {
178
- newState.active = false;
179
- newState.status = 'COMPLETED';
180
- newState.step = data.steps_taken || newState.step;
181
- newState.cumulativeReward = data.final_score !== undefined ? data.final_score : newState.cumulativeReward;
182
- newState.finalScore = data.final_score;
183
- newState.success = data.success;
184
- newState.fixVerified = data.fix_verified;
185
- if (data.clues_found) newState.clues_found = data.clues_found;
186
- if (data.reward_history) newState.rewardHistory = data.reward_history;
187
- if (data.final_breakdown) newState.rewardBreakdown = data.final_breakdown;
188
-
189
- newState.agents = {
190
- agent_a: { ...newState.agents.agent_a, status: 'STANDBY' },
191
- agent_b: { ...newState.agents.agent_b, status: 'STANDBY' }
192
- };
193
- }
194
-
195
- return newState;
196
- });
197
- };
198
-
199
- socketRef.current.onerror = (err) => setError(err);
200
- socketRef.current.onclose = () => setIsConnected(false);
201
-
202
- return () => socketRef.current.close();
203
- }, [url]);
204
-
205
- const sendCommand = useCallback((command) => {
206
- if (socketRef.current && isConnected) {
207
- socketRef.current.send(JSON.stringify(command));
208
- }
209
- }, [isConnected]);
210
-
211
- return { events, gameState, isConnected, error, sendCommand };
212
- };
213
-
214
- export default useWebSocket;
 
1
+ import { useState, useEffect, useCallback, useRef } from 'react';
2
+
3
+ const useWebSocket = (url) => {
4
+ const [events, setEvents] = useState([]);
5
+ const [gameState, setGameState] = useState({
6
+ scenario: null,
7
+ active: false,
8
+ status: 'AWAITING_OBJECTIVE',
9
+ step: 0,
10
+ reward: 0,
11
+ cumulativeReward: 0,
12
+ agent_a_model: '',
13
+ agent_b_model: '',
14
+ agents: {
15
+ agent_a: { status: 'STANDBY', messages: [] },
16
+ agent_b: { status: 'STANDBY', messages: [] }
17
+ },
18
+ clues_found: [],
19
+ rewardBreakdown: {},
20
+ rewardHistory: []
21
+ });
22
+
23
+ const [isConnected, setIsConnected] = useState(false);
24
+ const [error, setError] = useState(null);
25
+ const socketRef = useRef(null);
26
+
27
+ useEffect(() => {
28
+ socketRef.current = new WebSocket(url);
29
+
30
+ socketRef.current.onopen = () => setIsConnected(true);
31
+
32
+ socketRef.current.onmessage = (event) => {
33
+ const data = JSON.parse(event.data);
34
+ setEvents(prev => [...prev, data]);
35
+
36
+ setGameState(prev => {
37
+ let current = { ...prev };
38
+
39
+ if (data.type === 'episode_start') {
40
+ return {
41
+ ...current,
42
+ scenario: data.scenario,
43
+ active: true,
44
+ status: 'INVESTIGATING',
45
+ step: 0,
46
+ reward: 0,
47
+ cumulativeReward: 0,
48
+ clues_found: [],
49
+ agent_a_model: data.agent_a_model || current.agent_a_model,
50
+ agent_b_model: data.agent_b_model || current.agent_b_model,
51
+ agents: {
52
+ agent_a: { status: 'ACTIVE', messages: [] },
53
+ agent_b: { status: 'ACTIVE', messages: [] }
54
+ }
55
+ };
56
+ }
57
+
58
+ const newState = { ...current };
59
+
60
+ if (data.step !== undefined) {
61
+ newState.step = data.step;
62
+ }
63
+
64
+ if (data.type === 'agent_partial') {
65
+ const agentId = data.agent_id;
66
+ const agents = { ...newState.agents };
67
+ const agentReference = agents[agentId];
68
+ if (agentReference) {
69
+ const agent = { ...agentReference };
70
+ const messages = [...(agent.messages || [])];
71
+ const lastMsg = messages[messages.length - 1];
72
+ if (lastMsg && lastMsg.type === 'message' && lastMsg.partial) {
73
+ messages[messages.length - 1] = { ...lastMsg, content: data.full_message };
74
+ } else {
75
+ messages.push({
76
+ type: 'message',
77
+ content: data.full_message,
78
+ partial: true
79
+ });
80
+ }
81
+ agent.messages = messages;
82
+ agents[agentId] = agent;
83
+ newState.agents = agents;
84
+ }
85
+ }
86
+
87
+ if (data.type === 'agent_message') {
88
+ const agentId = data.agent_id;
89
+ const agents = { ...newState.agents };
90
+ const agentReference = agents[agentId];
91
+ if (agentReference) {
92
+ const agent = { ...agentReference };
93
+ const messages = [...(agent.messages || [])];
94
+ const lastMsg = messages[messages.length - 1];
95
+ if (lastMsg && lastMsg.partial) {
96
+ messages[messages.length - 1] = { ...lastMsg, content: data.message, partial: undefined };
97
+ } else {
98
+ messages.push({
99
+ type: 'message',
100
+ content: data.message
101
+ });
102
+ }
103
+ agent.messages = messages;
104
+ agents[agentId] = agent;
105
+ newState.agents = agents;
106
+ }
107
+ }
108
+
109
+ if (data.status === 'READY') {
110
+ newState.status = 'READY_TO_INJECT';
111
+ newState.active = false;
112
+ newState.agents = {
113
+ agent_a: { ...newState.agents.agent_a, messages: [] },
114
+ agent_b: { ...newState.agents.agent_b, messages: [] }
115
+ };
116
+ }
117
+
118
+ if (data.type === 'system_status') {
119
+ if (data.paused !== undefined) {
120
+ newState.status = data.paused ? 'PAUSED' : 'INVESTIGATING';
121
+ }
122
+ if (data.status) {
123
+ newState.status = data.status;
124
+ }
125
+ if (data.active !== undefined) {
126
+ newState.active = data.active;
127
+ }
128
+ }
129
+
130
+ if (data.type === 'tool_call') {
131
+ const agentId = data.agent_id;
132
+ const agents = { ...newState.agents };
133
+ if (agents[agentId]) {
134
+ const agent = { ...agents[agentId], messages: [...(agents[agentId].messages || [])] };
135
+ agent.messages.push({
136
+ type: 'tool_call',
137
+ tool_name: data.tool_name,
138
+ params: data.params
139
+ });
140
+ agents[agentId] = agent;
141
+ newState.agents = agents;
142
+ }
143
+ }
144
+
145
+ if (data.type === 'tool_result') {
146
+ const agents = { ...newState.agents };
147
+ const agentAReference = agents.agent_a;
148
+ if (agentAReference) {
149
+ const agentA = { ...agentAReference, messages: [...(agentAReference.messages || [])] };
150
+ agentA.messages.push({
151
+ type: 'tool_result',
152
+ tool_name: data.tool_name,
153
+ result: data.result,
154
+ success: data.success
155
+ });
156
+ agents.agent_a = agentA;
157
+ newState.agents = agents;
158
+ }
159
+
160
+ // Simple heuristic for clues if not sent explicitly
161
+ const res = data.result?.toLowerCase() || '';
162
+ if (res.includes('error') || res.includes('anomaly') || res.includes('warning') || res.includes('degraded') || data.tool_name === 'propose_fix') {
163
+ const currentClues = newState.clues_found || [];
164
+ if (!currentClues.includes(data.result)) {
165
+ newState.clues_found = [...currentClues, data.result];
166
+ }
167
+ }
168
+ }
169
+
170
+ if (data.type === 'reward_update') {
171
+ newState.reward = data.reward;
172
+ newState.cumulativeReward = data.cumulative;
173
+ newState.rewardBreakdown = data.breakdown || {};
174
+ newState.rewardHistory = [...(newState.rewardHistory || []), data.reward];
175
+ }
176
+
177
+ if (data.type === 'episode_end') {
178
+ newState.active = false;
179
+ newState.status = 'COMPLETED';
180
+ newState.step = data.steps_taken || newState.step;
181
+ newState.cumulativeReward = data.final_score !== undefined ? data.final_score : newState.cumulativeReward;
182
+ newState.finalScore = data.final_score;
183
+ newState.success = data.success;
184
+ newState.fixVerified = data.fix_verified;
185
+ if (data.clues_found) newState.clues_found = data.clues_found;
186
+ if (data.reward_history) newState.rewardHistory = data.reward_history;
187
+ if (data.final_breakdown) newState.rewardBreakdown = data.final_breakdown;
188
+
189
+ newState.agents = {
190
+ agent_a: { ...newState.agents.agent_a, status: 'STANDBY' },
191
+ agent_b: { ...newState.agents.agent_b, status: 'STANDBY' }
192
+ };
193
+ }
194
+
195
+ return newState;
196
+ });
197
+ };
198
+
199
+ socketRef.current.onerror = (err) => setError(err);
200
+ socketRef.current.onclose = () => setIsConnected(false);
201
+
202
+ return () => socketRef.current.close();
203
+ }, [url]);
204
+
205
+ const sendCommand = useCallback((command) => {
206
+ if (socketRef.current && isConnected) {
207
+ socketRef.current.send(JSON.stringify(command));
208
+ }
209
+ }, [isConnected]);
210
+
211
+ return { events, gameState, isConnected, error, sendCommand };
212
+ };
213
+
214
+ export default useWebSocket;
openenv.yaml CHANGED
@@ -1,59 +1,59 @@
1
- name: nexus-incident-investigation
2
- version: "1.0.0"
3
- tags: ["openenv"]
4
- description: >
5
- NEXUS — Dual Agent Incident Investigation Environment.
6
- Two AI agents collaborate to investigate real-world system incidents.
7
- Agent A (Investigator) proposes hypotheses and calls tools.
8
- Agent B (Validator) challenges claims and verifies fixes.
9
- Together they identify root causes across software, business-process,
10
- and cascade-system failure scenarios.
11
-
12
- tasks:
13
- - name: software-incident
14
- description: Single-service software bug causing user-facing errors
15
- difficulty: easy
16
- max_steps: 8
17
- grader: scenarios/graders/easy_grader.py
18
-
19
- - name: business-process-failure
20
- description: Multi-team process breakdown with misleading red-herrings
21
- difficulty: medium
22
- max_steps: 8
23
- grader: scenarios/graders/medium_grader.py
24
-
25
- - name: cascade-system-failure
26
- description: Multi-system cascade failure with misleading logs
27
- difficulty: hard
28
- max_steps: 8
29
- grader: scenarios/graders/hard_grader.py
30
-
31
- action_space:
32
- type: text
33
- description: Free-form natural language message with optional TOOL: calls
34
-
35
- observation_space:
36
- type: structured
37
- fields:
38
- scenario_description: string
39
- scenario_context: string
40
- partner_message: string
41
- tool_results: list
42
- clues_found: list
43
- investigation_stage: string
44
- round: integer
45
- available_tools: list
46
-
47
- reward_range: [0.0, 1.0]
48
- reward_description: >
49
- Dynamically computed from semantic similarity of hypothesis to root-cause,
50
- tool quality, fix correctness, and investigation efficiency.
51
-
52
- inference_script: inference.py
53
- entry_point: backend/main.py
54
- docker_port: 7860
55
-
56
- baseline_scores:
57
- software-incident: 0.88
58
- business-process-failure: 0.72
59
- cascade-system-failure: 0.48
 
1
+ name: nexus-incident-investigation
2
+ version: "1.0.0"
3
+ tags: ["openenv"]
4
+ description: >
5
+ NEXUS — Dual Agent Incident Investigation Environment.
6
+ Two AI agents collaborate to investigate real-world system incidents.
7
+ Agent A (Investigator) proposes hypotheses and calls tools.
8
+ Agent B (Validator) challenges claims and verifies fixes.
9
+ Together they identify root causes across software, business-process,
10
+ and cascade-system failure scenarios.
11
+
12
+ tasks:
13
+ - name: software-incident
14
+ description: Single-service software bug causing user-facing errors
15
+ difficulty: easy
16
+ max_steps: 8
17
+ grader: scenarios/graders/easy_grader.py
18
+
19
+ - name: business-process-failure
20
+ description: Multi-team process breakdown with misleading red-herrings
21
+ difficulty: medium
22
+ max_steps: 8
23
+ grader: scenarios/graders/medium_grader.py
24
+
25
+ - name: cascade-system-failure
26
+ description: Multi-system cascade failure with misleading logs
27
+ difficulty: hard
28
+ max_steps: 8
29
+ grader: scenarios/graders/hard_grader.py
30
+
31
+ action_space:
32
+ type: text
33
+ description: Free-form natural language message with optional TOOL: calls
34
+
35
+ observation_space:
36
+ type: structured
37
+ fields:
38
+ scenario_description: string
39
+ scenario_context: string
40
+ partner_message: string
41
+ tool_results: list
42
+ clues_found: list
43
+ investigation_stage: string
44
+ round: integer
45
+ available_tools: list
46
+
47
+ reward_range: [0.0, 1.0]
48
+ reward_description: >
49
+ Dynamically computed from semantic similarity of hypothesis to root-cause,
50
+ tool quality, fix correctness, and investigation efficiency.
51
+
52
+ inference_script: inference.py
53
+ entry_point: backend/main.py
54
+ docker_port: 7860
55
+
56
+ baseline_scores:
57
+ software-incident: 0.88
58
+ business-process-failure: 0.72
59
+ cascade-system-failure: 0.48
pyproject.toml CHANGED
@@ -15,6 +15,7 @@ dependencies = [
15
  "httpx>=0.24.0",
16
  "openai>=1.0.0",
17
  "psutil>=5.9.0",
 
18
  ]
19
 
20
  [project.scripts]
 
15
  "httpx>=0.24.0",
16
  "openai>=1.0.0",
17
  "psutil>=5.9.0",
18
+ "openenv-core>=0.2.0",
19
  ]
20
 
21
  [project.scripts]
setup.bat CHANGED
@@ -1,66 +1,66 @@
1
- @echo off
2
- echo ==============================================================
3
- echo NEXUS Incident Investigation Environment Setup
4
- echo ==============================================================
5
- echo.
6
-
7
- REM Check Python
8
- python --version >nul 2>&1
9
- if %errorlevel% neq 0 (
10
- echo [ERROR] Python is not installed or not in PATH!
11
- pause
12
- exit /b
13
- )
14
-
15
- REM Check npm
16
- npm --version >nul 2>&1
17
- if %errorlevel% neq 0 (
18
- echo [ERROR] Node.js/npm is not installed or not in PATH!
19
- pause
20
- exit /b
21
- )
22
-
23
- echo [1/3] Setting up Backend Virtual Environment...
24
- python -m venv backend\venv
25
- call backend\venv\Scripts\activate.bat
26
- pip install -r backend\requirements.txt
27
-
28
- echo.
29
- echo [2/3] Setting up Frontend Dependencies...
30
- cd frontend
31
- call npm install
32
- cd ..
33
-
34
- echo.
35
- echo [3/4] Pulling Required LLM Models (Ollama)...
36
- echo --------------------------------------------------------------
37
- echo This will ensure you have the correct models for the simulation.
38
- echo 1. microsoft/Phi-3-mini-4k-instruct (Investigator)
39
- echo 2. Qwen/Qwen2.5-1.5B-Instruct (Validator)
40
- echo 3. all-minilm (Reward Engine)
41
- echo.
42
- set /p PULL_MODELS="Do you want to pull these models now? (y/n): "
43
- if /i "%PULL_MODELS%"=="y" (
44
- echo [Pulling Phi-3...]
45
- ollama pull phi3:mini
46
- echo [Pulling Qwen-1.5B...]
47
- ollama pull qwen2.5:1.5b
48
- echo [Pulling all-minilm...]
49
- ollama pull all-minilm
50
- ) else (
51
- echo Skipping model pull. Ensure you pull them manually later.
52
- )
53
-
54
- echo.
55
- echo [4/4] Validating OpenEnv Compliance...
56
- call backend\venv\Scripts\python.exe openenv_validator.py
57
-
58
- echo.
59
- echo ==============================================================
60
- echo SETUP COMPLETE!
61
- echo.
62
- echo To run locally:
63
- echo 1. Start UI: cd frontend ^& npm run dev
64
- echo 2. Start API: cd backend ^& venv\Scripts\python main.py
65
- echo ==============================================================
66
- pause
 
1
+ @echo off
2
+ echo ==============================================================
3
+ echo NEXUS Incident Investigation Environment Setup
4
+ echo ==============================================================
5
+ echo.
6
+
7
+ REM Check Python
8
+ python --version >nul 2>&1
9
+ if %errorlevel% neq 0 (
10
+ echo [ERROR] Python is not installed or not in PATH!
11
+ pause
12
+ exit /b
13
+ )
14
+
15
+ REM Check npm
16
+ npm --version >nul 2>&1
17
+ if %errorlevel% neq 0 (
18
+ echo [ERROR] Node.js/npm is not installed or not in PATH!
19
+ pause
20
+ exit /b
21
+ )
22
+
23
+ echo [1/3] Setting up Backend Virtual Environment...
24
+ python -m venv backend\venv
25
+ call backend\venv\Scripts\activate.bat
26
+ pip install -r backend\requirements.txt
27
+
28
+ echo.
29
+ echo [2/3] Setting up Frontend Dependencies...
30
+ cd frontend
31
+ call npm install
32
+ cd ..
33
+
34
+ echo.
35
+ echo [3/4] Pulling Required LLM Models (Ollama)...
36
+ echo --------------------------------------------------------------
37
+ echo This will ensure you have the correct models for the simulation.
38
+ echo 1. microsoft/Phi-3-mini-4k-instruct (Investigator)
39
+ echo 2. Qwen/Qwen2.5-1.5B-Instruct (Validator)
40
+ echo 3. all-minilm (Reward Engine)
41
+ echo.
42
+ set /p PULL_MODELS="Do you want to pull these models now? (y/n): "
43
+ if /i "%PULL_MODELS%"=="y" (
44
+ echo [Pulling Phi-3...]
45
+ ollama pull phi3:mini
46
+ echo [Pulling Qwen-1.5B...]
47
+ ollama pull qwen2.5:1.5b
48
+ echo [Pulling all-minilm...]
49
+ ollama pull all-minilm
50
+ ) else (
51
+ echo Skipping model pull. Ensure you pull them manually later.
52
+ )
53
+
54
+ echo.
55
+ echo [4/4] Validating OpenEnv Compliance...
56
+ call backend\venv\Scripts\python.exe openenv_validator.py
57
+
58
+ echo.
59
+ echo ==============================================================
60
+ echo SETUP COMPLETE!
61
+ echo.
62
+ echo To run locally:
63
+ echo 1. Start UI: cd frontend ^& npm run dev
64
+ echo 2. Start API: cd backend ^& venv\Scripts\python main.py
65
+ echo ==============================================================
66
+ pause
setup.sh CHANGED
@@ -1,42 +1,42 @@
1
- #!/bin/bash
2
-
3
- echo "=============================================================="
4
- echo "NEXUS Incident Investigation Environment Setup"
5
- echo "=============================================================="
6
- echo ""
7
-
8
- # Check Python
9
- if ! command -v python3 &> /dev/null; then
10
- echo "[ERROR] python3 is not installed or not in PATH!"
11
- exit 1
12
- fi
13
-
14
- # Check npm
15
- if ! command -v npm &> /dev/null; then
16
- echo "[ERROR] npm is not installed or not in PATH!"
17
- exit 1
18
- fi
19
-
20
- echo "[1/3] Setting up Backend Virtual Environment..."
21
- python3 -m venv backend/venv
22
- source backend/venv/bin/activate
23
- pip install -r backend/requirements.txt
24
-
25
- echo ""
26
- echo "[2/3] Setting up Frontend Dependencies..."
27
- cd frontend
28
- npm install
29
- cd ..
30
-
31
- echo ""
32
- echo "[3/3] Validating OpenEnv Compliance..."
33
- backend/venv/bin/python openenv_validator.py
34
-
35
- echo ""
36
- echo "=============================================================="
37
- echo "SETUP COMPLETE!"
38
- echo ""
39
- echo "To run locally without Docker:"
40
- echo "1. Start UI: cd frontend && npm run dev"
41
- echo "2. Start API: cd backend && venv/bin/uvicorn main:app --reload"
42
- echo "=============================================================="
 
1
+ #!/bin/bash
2
+
3
+ echo "=============================================================="
4
+ echo "NEXUS Incident Investigation Environment Setup"
5
+ echo "=============================================================="
6
+ echo ""
7
+
8
+ # Check Python
9
+ if ! command -v python3 &> /dev/null; then
10
+ echo "[ERROR] python3 is not installed or not in PATH!"
11
+ exit 1
12
+ fi
13
+
14
+ # Check npm
15
+ if ! command -v npm &> /dev/null; then
16
+ echo "[ERROR] npm is not installed or not in PATH!"
17
+ exit 1
18
+ fi
19
+
20
+ echo "[1/3] Setting up Backend Virtual Environment..."
21
+ python3 -m venv backend/venv
22
+ source backend/venv/bin/activate
23
+ pip install -r backend/requirements.txt
24
+
25
+ echo ""
26
+ echo "[2/3] Setting up Frontend Dependencies..."
27
+ cd frontend
28
+ npm install
29
+ cd ..
30
+
31
+ echo ""
32
+ echo "[3/3] Validating OpenEnv Compliance..."
33
+ backend/venv/bin/python openenv_validator.py
34
+
35
+ echo ""
36
+ echo "=============================================================="
37
+ echo "SETUP COMPLETE!"
38
+ echo ""
39
+ echo "To run locally without Docker:"
40
+ echo "1. Start UI: cd frontend && npm run dev"
41
+ echo "2. Start API: cd backend && venv/bin/uvicorn main:app --reload"
42
+ echo "=============================================================="
tests/test_environment.py CHANGED
@@ -1,35 +1,35 @@
1
- import pytest
2
- import asyncio
3
- from core.environment import NexusEnvironment
4
-
5
- @pytest.mark.asyncio
6
- async def test_env_reset():
7
- env = NexusEnvironment()
8
- obs = await env.reset(task="software-incident")
9
- assert obs.scenario_description != ""
10
- assert "503" in str(obs.scenario_description).lower() or "rate limit" in str(obs.scenario_description).lower()
11
- assert env.active_episode is not None
12
-
13
- @pytest.mark.asyncio
14
- async def test_env_step():
15
- env = NexusEnvironment()
16
- await env.reset(task="software-incident")
17
-
18
- from api.schemas.action import NexusAction
19
- action = NexusAction(
20
- agent_id="agent_a",
21
- message="Checking Nginx logs",
22
- tool_calls=[],
23
- confidence=0.5
24
- )
25
-
26
- obs, reward, done, info = await env.step(action)
27
- assert reward >= 0.0
28
- assert not done
29
- assert env.active_episode.steps_taken == 1
30
-
31
- @pytest.mark.asyncio
32
- async def test_invalid_task():
33
- env = NexusEnvironment()
34
- with pytest.raises(ValueError):
35
- await env.reset(task="non-existent-task")
 
1
+ import pytest
2
+ import asyncio
3
+ from core.environment import NexusEnvironment
4
+
5
+ @pytest.mark.asyncio
6
+ async def test_env_reset():
7
+ env = NexusEnvironment()
8
+ obs = await env.reset(task="software-incident")
9
+ assert obs.scenario_description != ""
10
+ assert "503" in str(obs.scenario_description).lower() or "rate limit" in str(obs.scenario_description).lower()
11
+ assert env.active_episode is not None
12
+
13
+ @pytest.mark.asyncio
14
+ async def test_env_step():
15
+ env = NexusEnvironment()
16
+ await env.reset(task="software-incident")
17
+
18
+ from api.schemas.action import NexusAction
19
+ action = NexusAction(
20
+ agent_id="agent_a",
21
+ message="Checking Nginx logs",
22
+ tool_calls=[],
23
+ confidence=0.5
24
+ )
25
+
26
+ obs, reward, done, info = await env.step(action)
27
+ assert reward >= 0.0
28
+ assert not done
29
+ assert env.active_episode.steps_taken == 1
30
+
31
+ @pytest.mark.asyncio
32
+ async def test_invalid_task():
33
+ env = NexusEnvironment()
34
+ with pytest.raises(ValueError):
35
+ await env.reset(task="non-existent-task")
tests/test_reward.py CHANGED
@@ -1,34 +1,34 @@
1
- import pytest
2
- from unittest.mock import patch
3
- from core.reward_engine import compute_reward
4
-
5
- def test_reward_engine_basic():
6
- # Mock episode state
7
- class MockEpisode:
8
- def __init__(self):
9
- self.all_messages = ["Hello partner, let's investigate the Nginx 503 error."]
10
- self.clues_found = []
11
- self.previous_tool_calls = []
12
- self.steps_taken = 1
13
- self.difficulty = "easy"
14
- self.last_partner_message = "What do you see?"
15
- self.reward_history = []
16
- self.cumulative_reward = 0.0
17
-
18
- ep = MockEpisode()
19
-
20
- # Mock embeddings to avoid needing a server
21
- with patch('core.reward_engine.get_embedding', return_value=[0.1]*384), \
22
- patch('core.reward_engine.cos_sim', return_value=0.8):
23
-
24
- final_score, info = compute_reward(
25
- message="I will check the configuration file /etc/nginx/nginx.conf",
26
- tool_calls=[],
27
- tool_results=[],
28
- episode_state=ep,
29
- scenario={"root_cause": {"description": "Nginx rate limit"}}
30
- )
31
-
32
- assert 0.0 <= final_score <= 1.0
33
- assert "specificity" in info
34
- assert "progress" in info
 
1
+ import pytest
2
+ from unittest.mock import patch
3
+ from core.reward_engine import compute_reward
4
+
5
+ def test_reward_engine_basic():
6
+ # Mock episode state
7
+ class MockEpisode:
8
+ def __init__(self):
9
+ self.all_messages = ["Hello partner, let's investigate the Nginx 503 error."]
10
+ self.clues_found = []
11
+ self.previous_tool_calls = []
12
+ self.steps_taken = 1
13
+ self.difficulty = "easy"
14
+ self.last_partner_message = "What do you see?"
15
+ self.reward_history = []
16
+ self.cumulative_reward = 0.0
17
+
18
+ ep = MockEpisode()
19
+
20
+ # Mock embeddings to avoid needing a server
21
+ with patch('core.reward_engine.get_embedding', return_value=[0.1]*384), \
22
+ patch('core.reward_engine.cos_sim', return_value=0.8):
23
+
24
+ final_score, info = compute_reward(
25
+ message="I will check the configuration file /etc/nginx/nginx.conf",
26
+ tool_calls=[],
27
+ tool_results=[],
28
+ episode_state=ep,
29
+ scenario={"root_cause": {"description": "Nginx rate limit"}}
30
+ )
31
+
32
+ assert 0.0 <= final_score <= 1.0
33
+ assert "specificity" in info
34
+ assert "progress" in info
uv.lock CHANGED
@@ -6,41 +6,6 @@ name = "nexus-ai"
6
  version = "1.0.0"
7
  source = { editable = "." }
8
 
9
- [[package]]
10
- name = "fastapi"
11
- version = "0.115.0"
12
- source = { registry = "https://pypi.org/simple" }
13
-
14
- [[package]]
15
- name = "uvicorn"
16
- version = "0.32.0"
17
- source = { registry = "https://pypi.org/simple" }
18
-
19
- [[package]]
20
- name = "pydantic"
21
- version = "2.10.0"
22
- source = { registry = "https://pypi.org/simple" }
23
-
24
- [[package]]
25
- name = "python-dotenv"
26
- version = "1.0.1"
27
- source = { registry = "https://pypi.org/simple" }
28
-
29
- [[package]]
30
- name = "httpx"
31
- version = "0.28.0"
32
- source = { registry = "https://pypi.org/simple" }
33
-
34
- [[package]]
35
- name = "openai"
36
- version = "1.58.0"
37
- source = { registry = "https://pypi.org/simple" }
38
-
39
- [[package]]
40
- name = "psutil"
41
- version = "6.1.0"
42
- source = { registry = "https://pypi.org/simple" }
43
-
44
  [[package]]
45
  name = "openenv-core"
46
  version = "0.2.0"
 
6
  version = "1.0.0"
7
  source = { editable = "." }
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  [[package]]
10
  name = "openenv-core"
11
  version = "0.2.0"