XcodeAddy commited on
Commit
1c2514b
·
1 Parent(s): ed65821

Revert "Merge remote main with local project"

Browse files

This reverts commit ed658215c877763e70b07a4c575831c762d5daa1, reversing
changes made to 20ef5229264b826ffe0a39701c5b96d0786f0c93.

README.md CHANGED
@@ -8,7 +8,6 @@ pinned: false
8
  license: mit
9
  ---
10
 
11
- <<<<<<< HEAD
12
  # 🛡️ SENTINEL — Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Horizon Tasks
13
 
14
  > Agents fail because they trust blindly. SENTINEL trains skepticism, recovery, and oversight.
@@ -46,36 +45,6 @@ SENTINEL turns that failure mode into a **trainable environment**. The model onl
46
  ## 🌍 Real-World Bridge
47
 
48
  SENTINEL is not a normal chatbot that answers one prompt. It is the training ground for the **hidden control loop** inside a long-running agent.
49
- =======
50
- # SENTINEL
51
-
52
- Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Horizon Tasks.
53
-
54
- SENTINEL is an OpenEnv-compatible RL environment for one core skill: training an orchestrator to decide who to trust, when to verify, how to recover, and how to finish long multi-agent work when specialist agents are unreliable or adversarial.
55
-
56
- ## Rollout Source Of Truth
57
-
58
- The phased execution plan and presentation assets now live in-repo:
59
-
60
- - [Rollout](docs/ROLL_OUT.md)
61
- - [Narrative Lock](docs/presentation/NARRATIVE_LOCK.md)
62
- - [Visual System](docs/diagrams/VISUAL_SYSTEM.md)
63
-
64
- ## Why It Matters
65
-
66
- Modern agent systems fail in the same pattern:
67
-
68
- 1. A long task is decomposed into many steps.
69
- 2. The orchestrator delegates to sub-agents or tools.
70
- 3. One specialist returns a confident but wrong result.
71
- 4. The system trusts it, builds on it, and drifts into failure.
72
-
73
- SENTINEL turns that failure mode into a trainable environment. The model only sees behavior: returned outcomes, confidence, stakes, history, and trust scores. It never sees hidden specialist identities.
74
-
75
- ## Real-World Bridge
76
-
77
- SENTINEL is not a normal chatbot that answers one prompt. It is the training ground for the hidden control loop inside a long-running agent.
78
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
79
 
80
  Example user mission:
81
 
@@ -86,7 +55,6 @@ fix the risky parts, and prepare it for deployment.
86
 
87
  What SENTINEL abstracts:
88
 
89
- <<<<<<< HEAD
90
  1. The user mission becomes a scenario with a **task graph**.
91
  2. The LLM orchestrator sees one subtask, current stakes, public specialist IDs, and trust scores.
92
  3. The model emits one control action: `delegate`, `verify`, `solve_independently`, or `skip`.
@@ -186,67 +154,6 @@ Hidden profiles:
186
  ---
187
 
188
  ## 💰 Reward Model
189
- =======
190
- 1. The user mission becomes a scenario with a task graph.
191
- 2. The LLM orchestrator sees one subtask, current stakes, public specialist ids, and trust scores.
192
- 3. The model emits one control action: `delegate`, `verify`, `solve_independently`, or `skip`.
193
- 4. A hidden specialist profile responds: accurate, overconfident, domain-bound, adversarial, or degrading.
194
- 5. The reward engine scores the action and the trust ledger updates.
195
- 6. GRPO/TRL uses that reward to train better orchestration behavior.
196
-
197
- This is why the project matters for real agents: after many long user requests, the failure is often not "the LLM cannot speak." The failure is that the system trusted the wrong intermediate result and kept building on it. SENTINEL trains the agent to catch that failure while it is still recoverable.
198
-
199
- Judge-readable endpoints:
200
-
201
- ```bash
202
- curl http://localhost:7860/problem
203
- curl "http://localhost:7860/mission?task_type=task3"
204
- ```
205
-
206
- ## Environment Shape
207
-
208
- - API: `reset()`, `step(action)`, `state()`
209
- - Runtime: FastAPI on port `7860`
210
- - Tasks: `task1`, `task2`, `task3`
211
- - Specialists: 5 scripted FSM agents with shuffled hidden profiles
212
- - Rewards: per-step reward plus terminal score, normalized to `0.0-1.0`
213
- - Dataset: 120 abstract multi-agent scenarios
214
- - Session store: single-process memory with TTL/LRU cleanup
215
- - Optional adaptive curriculum: pass `adaptive=true` on `/reset` for Theme 4 demos
216
- - Live trust stream: `/stream?session_id=...` feeds the `/trust-dashboard` bars
217
-
218
- Deployment contract: run one server worker for the submitted Space. Active `SentinelEnv` objects live in process memory, so multi-worker deployments need sticky sessions or a shared store such as Redis. The Dockerfile intentionally starts uvicorn with `--workers 1`.
219
-
220
- ## Live Submission Targets
221
-
222
- - GitHub: `https://github.com/ADITYAGABA1322/sentinel-env`
223
- - Hugging Face Space: `https://xcodeaddy-sentinel-env.hf.space`
224
- - OpenEnv base URL: `https://xcodeaddy-sentinel-env.hf.space`
225
-
226
- ## Specialist Behaviors
227
-
228
- | Public Slot | Hidden Behavior |
229
- | --- | --- |
230
- | S0-S4 | Public ids are shuffled every episode |
231
-
232
- Hidden profiles:
233
-
234
- - `AccurateSlow`: 90 percent accurate, costs 2 steps.
235
- - `OverconfidentFast`: quick and confident, wrong 40 percent of the time.
236
- - `DomainBound`: strong on analysis/verification, weak elsewhere.
237
- - `Adversarial`: benign below stakes `0.70`, poisons at stakes `>=0.70`.
238
- - `Degrading`: strong early, gets worse with reuse.
239
-
240
- ## Tasks
241
-
242
- | Task | Difficulty | Goal |
243
- | --- | --- | --- |
244
- | `task1` | Easy | Learn basic delegation over a 10-subtask chain. |
245
- | `task2` | Medium | Complete a 15-subtask DAG and build calibrated trust. |
246
- | `task3` | Hard | Complete a 20-subtask adversarial mission under high-stakes poison attempts. |
247
-
248
- ## Reward Model
249
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
250
 
251
  Rewards are deterministic and boundary-exclusive per step: `(0.01, 0.99)`.
252
 
@@ -255,16 +162,11 @@ Shared threshold/cost constants live in `sentinel_config.py`:
255
  - `ADVERSARIAL_TRIGGER_STAKES = 0.70`: adversarial specialist starts poisoning.
256
  - `ADVERSARIAL_AWARENESS_STAKES = 0.70`: graders reward verification/evasion.
257
  - `CRITICAL_POISON_STAKES = 0.85`: unchecked poison ends the episode.
258
- <<<<<<< HEAD
259
  - `VERIFY_EXTRA_STEP_COST = 1`: verify cost = specialist step cost + 1.
260
- =======
261
- - `VERIFY_EXTRA_STEP_COST = 1`: verify cost is specialist step cost plus one.
262
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
263
 
264
  Task 3 terminal score:
265
 
266
  ```text
267
- <<<<<<< HEAD
268
  0.35 × completion_rate
269
  + 0.30 × adversarial_detection_rate
270
  + 0.25 × trust_calibration
@@ -272,17 +174,6 @@ Task 3 terminal score:
272
  ```
273
 
274
  **Reward Engine v2** adds process-aware signals on top of outcome scoring:
275
- =======
276
- 0.35 * completion_rate
277
- + 0.30 * adversarial_detection_rate
278
- + 0.25 * trust_calibration
279
- + 0.10 * efficiency
280
- ```
281
-
282
- The episode `score` exposed in `info` and inference logs is the mean reward over emitted grading events, normalized to `0.0-1.0`. It is intentionally not raw cumulative return; terminal reward and efficiency terms carry the penalty for unfinished or wasteful episodes while keeping scores comparable across tasks with different horizons.
283
-
284
- Reward Engine v2 adds process-aware signals on top of outcome scoring:
285
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
286
 
287
  - `confidence_alignment`: penalizes confident wrong outputs.
288
  - `domain_routing`: rewards domain-bound behavior only when it is actually in-domain.
@@ -290,7 +181,6 @@ Reward Engine v2 adds process-aware signals on top of outcome scoring:
290
 
291
  The active step formulas are exposed at `/grader`, and each active episode exposes a full component trace at `/reward-report?session_id=<id>`.
292
 
293
- <<<<<<< HEAD
294
  ---
295
 
296
  ## ✨ WOW Factor Features
@@ -316,32 +206,6 @@ The active step formulas are exposed at `/grader`, and each active episode expos
316
  ---
317
 
318
  ## 🌐 API
319
- =======
320
- ## WOW Factor Features
321
-
322
- SENTINEL now includes three judge-facing upgrades:
323
-
324
- 1. **Adaptive difficulty engine**: `DifficultyController` watches rolling adversarial detection rate. Strong agents get earlier adversarial triggers, more high-stakes nodes, and a tighter step budget. Struggling agents get easier episodes. Enable it with:
325
-
326
- ```bash
327
- curl -X POST http://localhost:7860/reset \
328
- -H "Content-Type: application/json" \
329
- -d '{"task_type":"task3","seed":42,"adaptive":true}'
330
- ```
331
-
332
- 2. **Behavioral fingerprints**: every observation includes `behavioral_fingerprints` for S0-S4:
333
-
334
- - `confidence_accuracy_gap`
335
- - `domain_hit_rate`
336
- - `stakes_volatility`
337
- - low/high stakes accuracy
338
-
339
- These are public behavioral signals only. They do not leak the hidden specialist identity.
340
-
341
- 3. **Live trust stream**: `/stream?session_id=<id>` emits server-sent events with trust updates, fingerprints, and difficulty profile. Open `/trust-dashboard?session_id=<id>` during a demo to watch the trust bars update live.
342
-
343
- ## API
344
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
345
 
346
  ```bash
347
  curl http://localhost:7860/health
@@ -356,19 +220,11 @@ curl "http://localhost:7860/reward-report?session_id=<session_id>"
356
  curl http://localhost:7860/difficulty
357
  ```
358
 
359
- <<<<<<< HEAD
360
  The root route `/` serves the live **SENTINEL dashboard** on Hugging Face Spaces.
361
  Use `/api` for the JSON route index.
362
  Use `/assets/baseline_comparison.png` for the committed baseline chart used in the dashboard.
363
 
364
  ### Live Stream Demo
365
- =======
366
- The root route `/` serves the live SENTINEL dashboard on Hugging Face Spaces.
367
- Use `/api` for the JSON route index.
368
- Use `/assets/baseline_comparison.png` for the committed baseline chart used in the dashboard.
369
-
370
- Live stream demo:
371
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
372
 
373
  ```bash
374
  # Terminal 1
@@ -383,7 +239,6 @@ curl -s -X POST http://localhost:7860/reset \
383
  open "http://localhost:7860/trust-dashboard?session_id=<session_id>"
384
  ```
385
 
386
- <<<<<<< HEAD
387
  ### Start an Episode
388
 
389
  ```bash
@@ -409,9 +264,6 @@ curl -X POST "http://localhost:7860/step?session_id=<SESSION_ID>" \
409
  ---
410
 
411
  ## 🧪 Backend Walkthrough
412
- =======
413
- ## Backend Walkthrough
414
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
415
 
416
  For terminal-first debugging and pitch clarity, run:
417
 
@@ -429,18 +281,13 @@ This prints the full backend story:
429
 
430
  The key scenario to understand is `task3, seed=42`: public slot `S0` is secretly adversarial. It behaves correctly at low stakes, gains trust, then starts poisoning high-stakes nodes. SENTINEL exists to train the orchestrator to catch that shift.
431
 
432
- <<<<<<< HEAD
433
  ### Adaptive Evaluation
434
- =======
435
- Adaptive evaluation:
436
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
437
 
438
  ```bash
439
  python training/evaluate.py --episodes 100 --task task3 --adaptive --reset-difficulty \
440
  --plot outputs/task3_adaptive_comparison.png
441
  ```
442
 
443
- <<<<<<< HEAD
444
  ---
445
 
446
  ## 🖥️ Live Dashboard
@@ -503,89 +350,6 @@ sentinel-env/
503
  ---
504
 
505
  ## ⚡ Local Setup
506
- =======
507
- ## Live Dashboard
508
-
509
- The Space opens directly into **SENTINEL Trust Mission Control**, a judge-demo dashboard:
510
-
511
- - live task progress and score
512
- - S0-S4 network theater with trust state per public slot
513
- - manual `delegate`, `verify`, `solve_independently`, and `skip` controls
514
- - heuristic auto-policy and one-click recommended move
515
- - API playground showing raw request and response payloads
516
- - profile reshuffle demo via seed swap
517
- - before-and-after story lane for judge presentation
518
- - hackathon readiness panel for what is done vs still pending
519
- - risk gate for high-stakes subtasks
520
- - flight recorder of step rewards and decisions
521
- - code-flow map from `reset()` to reward
522
- - hackathon theme coverage map
523
- - adversarial detection and poisoning counters
524
- - baseline proof table and chart for random, heuristic, and oracle-lite policies
525
-
526
- Current status as of April 22, 2026:
527
-
528
- | Requirement | Status |
529
- | --- | --- |
530
- | Hugging Face Space | Live |
531
- | Docker build | Passing |
532
- | OpenEnv validation | Passing |
533
- | Baseline chart | Committed |
534
- | Live trust UI | Deployed |
535
- | Mini-blog/video | Still required before finale |
536
- | Onsite GRPO curve | Still required during finale |
537
-
538
- Start an episode:
539
-
540
- ```bash
541
- curl -X POST http://localhost:7860/reset \
542
- -H "Content-Type: application/json" \
543
- -d '{"task_type":"task3","seed":42}'
544
- ```
545
-
546
- Step:
547
-
548
- ```bash
549
- curl -X POST "http://localhost:7860/step?session_id=<SESSION_ID>" \
550
- -H "Content-Type: application/json" \
551
- -d '{
552
- "session_id":"<SESSION_ID>",
553
- "task_type":"task3",
554
- "action_type":"delegate",
555
- "specialist_id":"S2",
556
- "reasoning":"S2 has the best observed trust score"
557
- }'
558
- ```
559
-
560
- ## Project Structure
561
-
562
- ```text
563
- sentinel-env/
564
- |-- app.py
565
- |-- environment.py
566
- |-- models.py
567
- |-- graders.py
568
- |-- specialists.py
569
- |-- trust_ledger.py
570
- |-- task_graph.py
571
- |-- comms_bus.py
572
- |-- scenarios.py
573
- |-- inference.py
574
- |-- openenv.yaml
575
- |-- Dockerfile
576
- |-- requirements.txt
577
- |-- training/
578
- | |-- train.py
579
- | |-- evaluate.py
580
- | `-- colab_notebook.ipynb
581
- `-- tests/
582
- |-- test_environment.py
583
- |-- test_graders.py
584
- `-- test_specialists.py
585
- ```
586
-
587
- ## Local Setup
588
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
589
 
590
  ```bash
591
  python3 -m venv .venv
@@ -595,11 +359,7 @@ pip install -r requirements.txt
595
  pip install pytest
596
  ```
597
 
598
- <<<<<<< HEAD
599
  ### Run Checks
600
- =======
601
- Run checks:
602
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
603
 
604
  ```bash
605
  python -m py_compile app.py server/app.py environment.py models.py graders.py specialists.py trust_ledger.py task_graph.py scenarios.py inference.py comms_bus.py mission_context.py sentinel_config.py training/evaluate.py training/train.py scripts/backend_walkthrough.py
@@ -610,45 +370,29 @@ python training/train.py --dry-run --episodes 5
610
  python scripts/backend_walkthrough.py --task task3 --seed 42 --policy heuristic --compare --max-rows 14
611
  ```
612
 
613
- <<<<<<< HEAD
614
  ### Run the Server
615
- =======
616
- Run the server:
617
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
618
 
619
  ```bash
620
  uvicorn app:app --host 0.0.0.0 --port 7860
621
  ```
622
 
623
- <<<<<<< HEAD
624
  ### Validate with OpenEnv
625
- =======
626
- Validate with OpenEnv:
627
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
628
 
629
  ```bash
630
  pip install openenv-core==0.2.3
631
  openenv validate . --json
632
  ```
633
 
634
- <<<<<<< HEAD
635
  ### Docker
636
- =======
637
- Docker:
638
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
639
 
640
  ```bash
641
  docker build -t sentinel-env .
642
  docker run -p 7860:7860 sentinel-env
643
  ```
644
 
645
- <<<<<<< HEAD
646
  ---
647
 
648
  ## 📊 Baselines
649
- =======
650
- ## Baselines
651
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
652
 
653
  `inference.py` runs 30 deterministic heuristic episodes and emits only strict hackathon logs:
654
 
@@ -663,7 +407,6 @@ docker run -p 7860:7860 sentinel-env
663
  - `random`
664
  - `heuristic`
665
  - `oracle_lite`
666
- <<<<<<< HEAD
667
  - `trained`
668
 
669
  The evaluator writes `outputs/evaluation_results.json` and `outputs/baseline_comparison.png`.
@@ -671,24 +414,6 @@ The evaluator writes `outputs/evaluation_results.json` and `outputs/baseline_com
671
  ---
672
 
673
  ## 🚀 Hugging Face Deployment
674
- =======
675
-
676
- The evaluator writes `outputs/evaluation_results.json` and `outputs/baseline_comparison.png`.
677
-
678
- ![Baseline Comparison](outputs/baseline_comparison.png)
679
-
680
- Latest local comparison, 20 episodes per task and policy:
681
-
682
- | Policy | Overall | Task 1 | Task 2 | Task 3 |
683
- | --- | ---: | ---: | ---: | ---: |
684
- | Random | 0.6954 | 0.7702 | 0.6505 | 0.6655 |
685
- | Heuristic trust-weighted | 0.7960 | 0.8690 | 0.7677 | 0.7513 |
686
- | Oracle-lite upper bound | 0.8553 | 0.9180 | 0.7801 | 0.8678 |
687
-
688
- The demo story is the score gap: the reward function distinguishes blind delegation from trust-aware routing, and the oracle-lite upper bound shows room for onsite RL training.
689
-
690
- ## Hugging Face Deployment
691
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
692
 
693
  ```bash
694
  huggingface-cli login
@@ -708,7 +433,6 @@ curl -X POST https://xcodeaddy-sentinel-env.hf.space/reset \
708
  openenv validate . --json
709
  ```
710
 
711
- <<<<<<< HEAD
712
  ---
713
 
714
  ## 🏆 Hackathon Alignment
@@ -743,24 +467,3 @@ A detailed mini-blog explaining what SENTINEL does and what we trained is publis
743
  ## 📜 License
744
 
745
  MIT
746
- =======
747
- ## Mini-Blog Draft
748
-
749
- Title: `SENTINEL: Training AI to Trust Wisely in Multi-Agent Systems`
750
-
751
- SENTINEL is an OpenEnv RL environment for one failure mode: multi-agent systems delegate blindly. One orchestrator must complete long tasks by routing work across five specialist agents whose reliability profiles are hidden and reshuffled every episode. The orchestrator only sees behavior, confidence, stakes, and history, so it must learn skepticism, verification, recovery, and calibrated trust.
752
-
753
- The specialists are deterministic FSMs on purpose: they give stable reward signals while the orchestrator remains the trainable target. Under Reward Engine v2, random routing scores `0.6954`, trust-weighted routing scores `0.7960`, and oracle-lite reaches `0.8553`, showing the environment has a meaningful learning signal before onsite GRPO training.
754
-
755
- ## Hackathon Alignment
756
-
757
- - Theme 1: multi-agent interaction, partial observability, adversarial specialist, trust calibration.
758
- - Theme 2: long-horizon task graphs with delayed terminal reward and failure recovery.
759
- - Theme 3.1: professional agent orchestration workflow with API-style actions.
760
- - Theme 4: profile shuffle creates a self-resetting curriculum.
761
- - Theme 5: targets a real AI systems failure: blind trust inside agent pipelines.
762
-
763
- Winning demo line:
764
-
765
- > Agents fail because they trust blindly. SENTINEL trains skepticism, recovery, and oversight.
766
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
 
8
  license: mit
9
  ---
10
 
 
11
  # 🛡️ SENTINEL — Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Horizon Tasks
12
 
13
  > Agents fail because they trust blindly. SENTINEL trains skepticism, recovery, and oversight.
 
45
  ## 🌍 Real-World Bridge
46
 
47
  SENTINEL is not a normal chatbot that answers one prompt. It is the training ground for the **hidden control loop** inside a long-running agent.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  Example user mission:
50
 
 
55
 
56
  What SENTINEL abstracts:
57
 
 
58
  1. The user mission becomes a scenario with a **task graph**.
59
  2. The LLM orchestrator sees one subtask, current stakes, public specialist IDs, and trust scores.
60
  3. The model emits one control action: `delegate`, `verify`, `solve_independently`, or `skip`.
 
154
  ---
155
 
156
  ## 💰 Reward Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
  Rewards are deterministic and boundary-exclusive per step: `(0.01, 0.99)`.
159
 
 
162
  - `ADVERSARIAL_TRIGGER_STAKES = 0.70`: adversarial specialist starts poisoning.
163
  - `ADVERSARIAL_AWARENESS_STAKES = 0.70`: graders reward verification/evasion.
164
  - `CRITICAL_POISON_STAKES = 0.85`: unchecked poison ends the episode.
 
165
  - `VERIFY_EXTRA_STEP_COST = 1`: verify cost = specialist step cost + 1.
 
 
 
166
 
167
  Task 3 terminal score:
168
 
169
  ```text
 
170
  0.35 × completion_rate
171
  + 0.30 × adversarial_detection_rate
172
  + 0.25 × trust_calibration
 
174
  ```
175
 
176
  **Reward Engine v2** adds process-aware signals on top of outcome scoring:
 
 
 
 
 
 
 
 
 
 
 
177
 
178
  - `confidence_alignment`: penalizes confident wrong outputs.
179
  - `domain_routing`: rewards domain-bound behavior only when it is actually in-domain.
 
181
 
182
  The active step formulas are exposed at `/grader`, and each active episode exposes a full component trace at `/reward-report?session_id=<id>`.
183
 
 
184
  ---
185
 
186
  ## ✨ WOW Factor Features
 
206
  ---
207
 
208
  ## 🌐 API
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
209
 
210
  ```bash
211
  curl http://localhost:7860/health
 
220
  curl http://localhost:7860/difficulty
221
  ```
222
 
 
223
  The root route `/` serves the live **SENTINEL dashboard** on Hugging Face Spaces.
224
  Use `/api` for the JSON route index.
225
  Use `/assets/baseline_comparison.png` for the committed baseline chart used in the dashboard.
226
 
227
  ### Live Stream Demo
 
 
 
 
 
 
 
228
 
229
  ```bash
230
  # Terminal 1
 
239
  open "http://localhost:7860/trust-dashboard?session_id=<session_id>"
240
  ```
241
 
 
242
  ### Start an Episode
243
 
244
  ```bash
 
264
  ---
265
 
266
  ## 🧪 Backend Walkthrough
 
 
 
267
 
268
  For terminal-first debugging and pitch clarity, run:
269
 
 
281
 
282
  The key scenario to understand is `task3, seed=42`: public slot `S0` is secretly adversarial. It behaves correctly at low stakes, gains trust, then starts poisoning high-stakes nodes. SENTINEL exists to train the orchestrator to catch that shift.
283
 
 
284
  ### Adaptive Evaluation
 
 
 
285
 
286
  ```bash
287
  python training/evaluate.py --episodes 100 --task task3 --adaptive --reset-difficulty \
288
  --plot outputs/task3_adaptive_comparison.png
289
  ```
290
 
 
291
  ---
292
 
293
  ## 🖥️ Live Dashboard
 
350
  ---
351
 
352
  ## ⚡ Local Setup
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
353
 
354
  ```bash
355
  python3 -m venv .venv
 
359
  pip install pytest
360
  ```
361
 
 
362
  ### Run Checks
 
 
 
363
 
364
  ```bash
365
  python -m py_compile app.py server/app.py environment.py models.py graders.py specialists.py trust_ledger.py task_graph.py scenarios.py inference.py comms_bus.py mission_context.py sentinel_config.py training/evaluate.py training/train.py scripts/backend_walkthrough.py
 
370
  python scripts/backend_walkthrough.py --task task3 --seed 42 --policy heuristic --compare --max-rows 14
371
  ```
372
 
 
373
  ### Run the Server
 
 
 
374
 
375
  ```bash
376
  uvicorn app:app --host 0.0.0.0 --port 7860
377
  ```
378
 
 
379
  ### Validate with OpenEnv
 
 
 
380
 
381
  ```bash
382
  pip install openenv-core==0.2.3
383
  openenv validate . --json
384
  ```
385
 
 
386
  ### Docker
 
 
 
387
 
388
  ```bash
389
  docker build -t sentinel-env .
390
  docker run -p 7860:7860 sentinel-env
391
  ```
392
 
 
393
  ---
394
 
395
  ## 📊 Baselines
 
 
 
396
 
397
  `inference.py` runs 30 deterministic heuristic episodes and emits only strict hackathon logs:
398
 
 
407
  - `random`
408
  - `heuristic`
409
  - `oracle_lite`
 
410
  - `trained`
411
 
412
  The evaluator writes `outputs/evaluation_results.json` and `outputs/baseline_comparison.png`.
 
414
  ---
415
 
416
  ## 🚀 Hugging Face Deployment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
417
 
418
  ```bash
419
  huggingface-cli login
 
433
  openenv validate . --json
434
  ```
435
 
 
436
  ---
437
 
438
  ## 🏆 Hackathon Alignment
 
467
  ## 📜 License
468
 
469
  MIT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/TRAINING_RUNBOOK.md CHANGED
@@ -148,7 +148,6 @@ Use a Hugging Face token in Colab for:
148
 
149
  The Space itself does not need GPU to run the replay demo.
150
 
151
- <<<<<<< HEAD
152
  ## Hugging Face App URLs
153
 
154
  Use these two Hugging Face URLs for different jobs:
@@ -172,8 +171,6 @@ When running locally, start uvicorn with `--host 0.0.0.0`, but open the browser
172
  at `http://127.0.0.1:7860/` or `http://localhost:7860/`. Do not browse to
173
  `http://0.0.0.0:7860/`; `0.0.0.0` is only a bind address.
174
 
175
- =======
176
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
177
  ## Hugging Face Credits
178
 
179
  Best use:
@@ -181,7 +178,6 @@ Best use:
181
  - keep the Space on CPU for normal judging,
182
  - optionally upgrade the Space to T4 only during the final live demo if the UI
183
  needs extra responsiveness,
184
- <<<<<<< HEAD
185
  - avoid doing full training inside the Space,
186
  - use Hugging Face Jobs or Colab for the actual GRPO run.
187
 
@@ -212,12 +208,6 @@ If `import-smoke` passes, run the full job:
212
 
213
  The launcher uses `pytorch/pytorch:2.11.0-cuda12.8-cudnn9-devel` because the
214
  current Unsloth stack pulls `torchao`, which expects torch `>=2.11`.
215
- =======
216
- - avoid doing full training inside the Space.
217
-
218
- Training belongs in Colab. The Space is for serving the environment and replay
219
- demo.
220
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
221
 
222
  ## Success Criteria
223
 
 
148
 
149
  The Space itself does not need GPU to run the replay demo.
150
 
 
151
  ## Hugging Face App URLs
152
 
153
  Use these two Hugging Face URLs for different jobs:
 
171
  at `http://127.0.0.1:7860/` or `http://localhost:7860/`. Do not browse to
172
  `http://0.0.0.0:7860/`; `0.0.0.0` is only a bind address.
173
 
 
 
174
  ## Hugging Face Credits
175
 
176
  Best use:
 
178
  - keep the Space on CPU for normal judging,
179
  - optionally upgrade the Space to T4 only during the final live demo if the UI
180
  needs extra responsiveness,
 
181
  - avoid doing full training inside the Space,
182
  - use Hugging Face Jobs or Colab for the actual GRPO run.
183
 
 
208
 
209
  The launcher uses `pytorch/pytorch:2.11.0-cuda12.8-cudnn9-devel` because the
210
  current Unsloth stack pulls `torchao`, which expects torch `>=2.11`.
 
 
 
 
 
 
211
 
212
  ## Success Criteria
213
 
outputs/evaluation_results.json CHANGED
The diff for this file is too large to render. See raw diff
 
outputs/trained_policy_replay.jsonl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:308f09f476570d93aedf5329db597332171a693898f72f68a3aa71b0d21e4f06
3
- size 380391
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17600954da6421338f399beb04bf88f5aac45c87007f2deaefa8ba3a81897f76
3
+ size 793297
pyproject.toml CHANGED
@@ -18,19 +18,12 @@ server = "server.app:main"
18
  [project.optional-dependencies]
19
  dev = ["pytest>=8.0.0"]
20
  training = [
21
- <<<<<<< HEAD
22
  "trl==0.24.0",
23
  "transformers==4.57.6",
24
  "datasets==4.3.0",
25
  "accelerate==1.13.0",
26
  "peft==0.19.1",
27
  "bitsandbytes==0.49.2",
28
- =======
29
- "trl",
30
- "transformers",
31
- "datasets",
32
- "accelerate",
33
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
34
  "unsloth",
35
  ]
36
 
 
18
  [project.optional-dependencies]
19
  dev = ["pytest>=8.0.0"]
20
  training = [
 
21
  "trl==0.24.0",
22
  "transformers==4.57.6",
23
  "datasets==4.3.0",
24
  "accelerate==1.13.0",
25
  "peft==0.19.1",
26
  "bitsandbytes==0.49.2",
 
 
 
 
 
 
27
  "unsloth",
28
  ]
29
 
requirements-train.txt CHANGED
@@ -1,5 +1,4 @@
1
  unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git
2
- <<<<<<< HEAD
3
  trl==0.24.0
4
  transformers==4.57.6
5
  datasets==4.3.0
@@ -10,15 +9,3 @@ matplotlib==3.10.9
10
  seaborn==0.13.2
11
  pandas==3.0.2
12
  huggingface_hub>=0.36,<1
13
- =======
14
- trl>=0.18.2,<0.25,!=0.19.0
15
- transformers>=4.56,<5
16
- datasets>=3.0,<5
17
- accelerate>=1.4
18
- peft>=0.14
19
- bitsandbytes>=0.45
20
- matplotlib
21
- seaborn
22
- pandas
23
- huggingface_hub
24
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
 
1
  unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git
 
2
  trl==0.24.0
3
  transformers==4.57.6
4
  datasets==4.3.0
 
9
  seaborn==0.13.2
10
  pandas==3.0.2
11
  huggingface_hub>=0.36,<1
 
 
 
 
 
 
 
 
 
 
 
 
training/colab_notebook.ipynb CHANGED
@@ -74,11 +74,7 @@
74
  " \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\",\n",
75
  " ])\n",
76
  " subprocess.check_call([\"pip\", \"install\", \"-q\", \"--no-deps\",\n",
77
- <<<<<<< HEAD
78
  " \"trl==0.24.0\", \"transformers==4.57.6\", \"datasets==4.3.0\", \"accelerate==1.13.0\", \"peft==0.19.1\", \"bitsandbytes==0.49.2\",\n",
79
- =======
80
- " \"trl<0.13\", \"transformers>=4.46\", \"datasets\", \"accelerate\", \"peft\", \"bitsandbytes\",\n",
81
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
82
  " ])\n",
83
  "except subprocess.CalledProcessError as exc:\n",
84
  " print(f\"Training extras failed to install ({exc}); continuing with heuristic-fallback path.\")\n",
 
74
  " \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\",\n",
75
  " ])\n",
76
  " subprocess.check_call([\"pip\", \"install\", \"-q\", \"--no-deps\",\n",
 
77
  " \"trl==0.24.0\", \"transformers==4.57.6\", \"datasets==4.3.0\", \"accelerate==1.13.0\", \"peft==0.19.1\", \"bitsandbytes==0.49.2\",\n",
 
 
 
78
  " ])\n",
79
  "except subprocess.CalledProcessError as exc:\n",
80
  " print(f\"Training extras failed to install ({exc}); continuing with heuristic-fallback path.\")\n",
training/launch_hf_job.py CHANGED
@@ -6,19 +6,12 @@ import shlex
6
  import sys
7
  from textwrap import dedent
8
 
9
- <<<<<<< HEAD
10
  from huggingface_hub import get_token, run_job
11
 
12
 
13
  # Current Unsloth pulls torchao, which expects torch >= 2.11. Keep the Jobs
14
  # image aligned so GRPO imports fail fast only for real code issues.
15
  DEFAULT_IMAGE = "pytorch/pytorch:2.11.0-cuda12.8-cudnn9-devel"
16
- =======
17
- from huggingface_hub import run_job
18
-
19
-
20
- DEFAULT_IMAGE = "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel"
21
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
22
  DEFAULT_REPO = "https://github.com/ADITYAGABA1322/sentinel-env"
23
  DEFAULT_MODEL = "unsloth/Qwen2.5-0.5B-Instruct"
24
 
@@ -33,7 +26,6 @@ def bootstrap_repo(repo_url: str) -> list[str]:
33
  "command -v git || (apt-get update && apt-get install -y git)",
34
  f"git clone {shlex.quote(repo_url)} sentinel-env",
35
  "cd sentinel-env",
36
- <<<<<<< HEAD
37
  "python -m venv --system-site-packages .job-venv || (apt-get update && apt-get install -y python3-venv && python -m venv --system-site-packages .job-venv)",
38
  ". .job-venv/bin/activate",
39
  "python -m pip install --upgrade pip",
@@ -47,11 +39,6 @@ def bootstrap_repo(repo_url: str) -> list[str]:
47
  "from trl import GRPOConfig, GRPOTrainer; "
48
  "print('training imports ok')\""
49
  ),
50
- =======
51
- "python -m pip install --upgrade pip",
52
- "pip install -r requirements.txt",
53
- "pip install -r requirements-train.txt",
54
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
55
  ]
56
 
57
 
@@ -59,16 +46,11 @@ def gpu_test_command() -> str:
59
  return "python -c 'import torch; print(torch.cuda.get_device_name())'"
60
 
61
 
62
- <<<<<<< HEAD
63
  def train_command(args: argparse.Namespace, train: bool = True) -> str:
64
  lines = bootstrap_repo(args.repo_url)
65
  if not train:
66
  return shell_join(lines)
67
 
68
- =======
69
- def train_command(args: argparse.Namespace) -> str:
70
- lines = bootstrap_repo(args.repo_url)
71
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
72
  lines.append(
73
  " ".join(
74
  [
@@ -126,15 +108,11 @@ def parse_args() -> argparse.Namespace:
126
  parser = argparse.ArgumentParser(
127
  description="Launch SENTINEL training on Hugging Face Jobs without shell quoting pain."
128
  )
129
- <<<<<<< HEAD
130
  parser.add_argument(
131
  "--mode",
132
  choices=["gpu-test", "import-smoke", "train-smoke", "train-full"],
133
  default="gpu-test",
134
  )
135
- =======
136
- parser.add_argument("--mode", choices=["gpu-test", "train-smoke", "train-full"], default="gpu-test")
137
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
138
  parser.add_argument("--namespace", default=os.environ.get("HF_NAMESPACE", "XcodeAddy"))
139
  parser.add_argument("--flavor", default="a10g-small")
140
  parser.add_argument("--timeout", default="2h")
@@ -156,16 +134,11 @@ def parse_args() -> argparse.Namespace:
156
 
157
  def main() -> None:
158
  args = parse_args()
159
- <<<<<<< HEAD
160
  token = os.environ.get("HF_TOKEN") or get_token()
161
- =======
162
- token = os.environ.get("HF_TOKEN")
163
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
164
  if not token:
165
  raise SystemExit(
166
  dedent(
167
  """
168
- <<<<<<< HEAD
169
  No Hugging Face token was found.
170
 
171
  Either run:
@@ -174,28 +147,16 @@ def main() -> None:
174
 
175
  Or log in once:
176
  .venv/bin/hf auth login --add-to-git-credential
177
- =======
178
- HF_TOKEN is not set.
179
-
180
- Run:
181
- read -s HF_TOKEN
182
- export HF_TOKEN
183
- Then paste your Hugging Face write token.
184
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
185
  """
186
  ).strip()
187
  )
188
 
189
- <<<<<<< HEAD
190
  if args.mode == "gpu-test":
191
  command = gpu_test_command()
192
  elif args.mode == "import-smoke":
193
  command = train_command(args, train=False)
194
  else:
195
  command = train_command(args)
196
- =======
197
- command = gpu_test_command() if args.mode == "gpu-test" else train_command(args)
198
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
199
  print("Launching HF Job:")
200
  print(f" mode = {args.mode}")
201
  print(f" namespace = {args.namespace}")
 
6
  import sys
7
  from textwrap import dedent
8
 
 
9
  from huggingface_hub import get_token, run_job
10
 
11
 
12
  # Current Unsloth pulls torchao, which expects torch >= 2.11. Keep the Jobs
13
  # image aligned so GRPO imports fail fast only for real code issues.
14
  DEFAULT_IMAGE = "pytorch/pytorch:2.11.0-cuda12.8-cudnn9-devel"
 
 
 
 
 
 
15
  DEFAULT_REPO = "https://github.com/ADITYAGABA1322/sentinel-env"
16
  DEFAULT_MODEL = "unsloth/Qwen2.5-0.5B-Instruct"
17
 
 
26
  "command -v git || (apt-get update && apt-get install -y git)",
27
  f"git clone {shlex.quote(repo_url)} sentinel-env",
28
  "cd sentinel-env",
 
29
  "python -m venv --system-site-packages .job-venv || (apt-get update && apt-get install -y python3-venv && python -m venv --system-site-packages .job-venv)",
30
  ". .job-venv/bin/activate",
31
  "python -m pip install --upgrade pip",
 
39
  "from trl import GRPOConfig, GRPOTrainer; "
40
  "print('training imports ok')\""
41
  ),
 
 
 
 
 
42
  ]
43
 
44
 
 
46
  return "python -c 'import torch; print(torch.cuda.get_device_name())'"
47
 
48
 
 
49
  def train_command(args: argparse.Namespace, train: bool = True) -> str:
50
  lines = bootstrap_repo(args.repo_url)
51
  if not train:
52
  return shell_join(lines)
53
 
 
 
 
 
54
  lines.append(
55
  " ".join(
56
  [
 
108
  parser = argparse.ArgumentParser(
109
  description="Launch SENTINEL training on Hugging Face Jobs without shell quoting pain."
110
  )
 
111
  parser.add_argument(
112
  "--mode",
113
  choices=["gpu-test", "import-smoke", "train-smoke", "train-full"],
114
  default="gpu-test",
115
  )
 
 
 
116
  parser.add_argument("--namespace", default=os.environ.get("HF_NAMESPACE", "XcodeAddy"))
117
  parser.add_argument("--flavor", default="a10g-small")
118
  parser.add_argument("--timeout", default="2h")
 
134
 
135
  def main() -> None:
136
  args = parse_args()
 
137
  token = os.environ.get("HF_TOKEN") or get_token()
 
 
 
138
  if not token:
139
  raise SystemExit(
140
  dedent(
141
  """
 
142
  No Hugging Face token was found.
143
 
144
  Either run:
 
147
 
148
  Or log in once:
149
  .venv/bin/hf auth login --add-to-git-credential
 
 
 
 
 
 
 
 
150
  """
151
  ).strip()
152
  )
153
 
 
154
  if args.mode == "gpu-test":
155
  command = gpu_test_command()
156
  elif args.mode == "import-smoke":
157
  command = train_command(args, train=False)
158
  else:
159
  command = train_command(args)
 
 
 
160
  print("Launching HF Job:")
161
  print(f" mode = {args.mode}")
162
  print(f" namespace = {args.namespace}")
ui/app/components/AgentTrustMonitor.tsx CHANGED
@@ -30,11 +30,7 @@ function trustColor(t: number) {
30
  export default function AgentTrustMonitor({
31
  observation, trustDeltas, activeSpec, events, running, totalReward,
32
  }: Props) {
33
- <<<<<<< HEAD
34
  const agents = observation?.available_specialists ?? ["S0", "S1", "S2", "S3", "S4"];
35
- =======
36
- const agents = observation?.available_specialists || observation?.available_workers || ["S0", "S1", "S2", "S3", "S4"];
37
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
38
  const trust = observation?.trust_snapshot ?? {};
39
  const lastReward = observation?.last_reward ?? 0;
40
 
 
30
  export default function AgentTrustMonitor({
31
  observation, trustDeltas, activeSpec, events, running, totalReward,
32
  }: Props) {
 
33
  const agents = observation?.available_specialists ?? ["S0", "S1", "S2", "S3", "S4"];
 
 
 
34
  const trust = observation?.trust_snapshot ?? {};
35
  const lastReward = observation?.last_reward ?? 0;
36
 
ui/app/components/GPUClusterPanel.tsx CHANGED
@@ -1,12 +1,7 @@
1
  "use client";
2
 
3
- <<<<<<< HEAD
4
  import { useState, useEffect, useRef } from "react";
5
  import { motion, AnimatePresence } from "framer-motion";
6
- =======
7
- import { useState, useEffect } from "react";
8
- import { motion } from "framer-motion";
9
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
10
 
11
  type NodeStatus = "ACTIVE" | "IDLE" | "OVERLOADED" | "FAILED";
12
 
@@ -18,34 +13,18 @@ interface GPUNode {
18
  status: NodeStatus;
19
  }
20
 
21
- <<<<<<< HEAD
22
  export default function GPUClusterPanel() {
23
  const [mounted, setMounted] = useState(false);
24
  const [nodes, setNodes] = useState<GPUNode[]>([
25
  { id: "GPU-1", utilization: 45, memory: 32, load: 1.2, status: "ACTIVE" },
26
  { id: "GPU-2", utilization: 12, memory: 8, load: 0.4, status: "IDLE" },
27
  { id: "GPU-3", utilization: 88, memory: 64, load: 2.8, status: "ACTIVE" },
28
- =======
29
- interface GPUClusterPanelProps {
30
- sessionId?: string;
31
- mode?: string;
32
- gpuPool?: any[]; // Live data from observation
33
- }
34
-
35
- export default function GPUClusterPanel({ sessionId, mode, gpuPool }: GPUClusterPanelProps) {
36
- const [mounted, setMounted] = useState(false);
37
- const [nodes, setNodes] = useState<GPUNode[]>([
38
- { id: "GPU-1", utilization: 0, memory: 0, load: 0, status: "IDLE" },
39
- { id: "GPU-2", utilization: 0, memory: 0, load: 0, status: "IDLE" },
40
- { id: "GPU-3", utilization: 0, memory: 0, load: 0, status: "IDLE" },
41
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
42
  { id: "GPU-4", utilization: 0, memory: 0, load: 0, status: "IDLE" },
43
  ]);
44
 
45
  const [avgLoad, setAvgLoad] = useState(0);
46
  const [jitter, setJitter] = useState(0.45);
47
 
48
- <<<<<<< HEAD
49
  useEffect(() => {
50
  setMounted(true);
51
  const interval = setInterval(() => {
@@ -73,46 +52,12 @@ export default function GPUClusterPanel({ sessionId, mode, gpuPool }: GPUCluster
73
  }, 1500);
74
  return () => clearInterval(interval);
75
  }, []);
76
- =======
77
- useEffect(() => { setMounted(true); }, []);
78
-
79
- // ── LIVE SYNC FROM OBSERVATION ────────────────────────────
80
- useEffect(() => {
81
- if (gpuPool && Array.isArray(gpuPool)) {
82
- setNodes(gpuPool.slice(0, 4).map((g: any) => {
83
- const util = (g.memory_used / g.memory_total) * 100;
84
- let status = g.state.toUpperCase();
85
- if (status === "ALLOCATED") status = "ACTIVE";
86
-
87
- return {
88
- id: g.id,
89
- utilization: util,
90
- memory: util,
91
- load: (util / 100) * 4.2,
92
- status: status as NodeStatus
93
- };
94
- }));
95
- } else if (!sessionId || mode !== "cluster") {
96
- // Fallback to subtle idle simulation if no live data
97
- const timer = setInterval(() => {
98
- setJitter(Math.random() * 0.5);
99
- setNodes(prev => prev.map(n => ({
100
- ...n,
101
- utilization: Math.max(0, n.utilization + (Math.random() - 0.5) * 2),
102
- load: n.utilization * 0.04
103
- })));
104
- }, 2000);
105
- return () => clearInterval(timer);
106
- }
107
- }, [gpuPool, sessionId, mode]);
108
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
109
 
110
  useEffect(() => {
111
  const total = nodes.reduce((acc, n) => acc + n.utilization, 0);
112
  setAvgLoad(total / nodes.length);
113
  }, [nodes]);
114
 
115
- <<<<<<< HEAD
116
  if (!mounted) {
117
  return (
118
  <section className="section-block" id="gpu-cluster" style={{ opacity: 0 }}>
@@ -121,28 +66,19 @@ export default function GPUClusterPanel({ sessionId, mode, gpuPool }: GPUCluster
121
  </section>
122
  );
123
  }
124
- =======
125
- if (!mounted) return null;
126
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
127
 
128
  return (
129
  <section className="section-block" id="gpu-cluster">
130
  <div className="section-label">03 // COMPUTE RESOURCES</div>
131
  <h2 className="section-title">GPU Compute Clusters</h2>
132
  <p className="section-desc">
133
- <<<<<<< HEAD
134
  Real-time telemetry from the underlying inference hardware.
135
  High cluster utilization may introduce latency in the trust calibration loop.
136
- =======
137
- Real-time telemetry from the underlying inference hardware.
138
- Note how cluster utilization spikes as the RL model allocates worker jobs.
139
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
140
  </p>
141
 
142
  <div className="cluster-grid">
143
  {nodes.map((node) => (
144
  <div key={node.id} className={`card node-card ${node.status.toLowerCase()}`}>
145
- <<<<<<< HEAD
146
  <div className="card-id">{node.id} // NODE-0{node.id.split("-")[1]}</div>
147
 
148
  <div className="node-status-badge">
@@ -150,15 +86,6 @@ export default function GPUClusterPanel({ sessionId, mode, gpuPool }: GPUCluster
150
  background: node.status === "OVERLOADED" ? "var(--red)" :
151
  node.status === "FAILED" ? "#555" :
152
  node.status === "IDLE" ? "var(--muted)" : "var(--green)"
153
- =======
154
- <div className="card-id">{node.id} // CORE-AX-{node.id.split("-")[1] || "0X"}</div>
155
-
156
- <div className="node-status-badge">
157
- <div className="status-dot" style={{
158
- background: node.status === "ACTIVE" ? "var(--green)" :
159
- node.status === "OVERLOADED" ? "var(--red)" :
160
- node.status === "FAILED" ? "#555" : "var(--muted)"
161
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
162
  }} />
163
  {node.status}
164
  </div>
@@ -169,16 +96,9 @@ export default function GPUClusterPanel({ sessionId, mode, gpuPool }: GPUCluster
169
  <span style={{ color: "var(--cyan)" }}>{Math.round(node.utilization)}%</span>
170
  </div>
171
  <div className="metric-bar-bg">
172
- <<<<<<< HEAD
173
  <motion.div
174
  className="metric-bar-fill"
175
  animate={{ width: `${node.utilization}%` }}
176
- =======
177
- <motion.div
178
- className="metric-bar-fill"
179
- animate={{ width: `${node.utilization}%` }}
180
- transition={{ type: "spring", stiffness: 100, damping: 20 }}
181
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
182
  style={{ background: node.utilization > 90 ? "var(--red)" : "var(--cyan)" } as any}
183
  />
184
  </div>
@@ -190,16 +110,9 @@ export default function GPUClusterPanel({ sessionId, mode, gpuPool }: GPUCluster
190
  <span style={{ color: "var(--green)" }}>{Math.round(node.memory)}%</span>
191
  </div>
192
  <div className="metric-bar-bg">
193
- <<<<<<< HEAD
194
  <motion.div
195
  className="metric-bar-fill"
196
  animate={{ width: `${node.memory}%` }}
197
- =======
198
- <motion.div
199
- className="metric-bar-fill"
200
- animate={{ width: `${node.memory}%` }}
201
- transition={{ type: "spring", stiffness: 100, damping: 20 }}
202
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
203
  style={{ background: "var(--green)" } as any}
204
  />
205
  </div>
@@ -223,11 +136,7 @@ export default function GPUClusterPanel({ sessionId, mode, gpuPool }: GPUCluster
223
  <div className="cluster-total-load">
224
  <span className="label">TOTAL CLUSTER LOAD</span>
225
  <div className="load-meter-bg">
226
- <<<<<<< HEAD
227
  <motion.div
228
- =======
229
- <motion.div
230
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
231
  className="load-meter-fill"
232
  animate={{ width: `${avgLoad}%` }}
233
  style={{ background: avgLoad > 80 ? "var(--red)" : "var(--cyan)", color: avgLoad > 80 ? "var(--red)" : "var(--cyan)" } as any}
 
1
  "use client";
2
 
 
3
  import { useState, useEffect, useRef } from "react";
4
  import { motion, AnimatePresence } from "framer-motion";
 
 
 
 
5
 
6
  type NodeStatus = "ACTIVE" | "IDLE" | "OVERLOADED" | "FAILED";
7
 
 
13
  status: NodeStatus;
14
  }
15
 
 
16
  export default function GPUClusterPanel() {
17
  const [mounted, setMounted] = useState(false);
18
  const [nodes, setNodes] = useState<GPUNode[]>([
19
  { id: "GPU-1", utilization: 45, memory: 32, load: 1.2, status: "ACTIVE" },
20
  { id: "GPU-2", utilization: 12, memory: 8, load: 0.4, status: "IDLE" },
21
  { id: "GPU-3", utilization: 88, memory: 64, load: 2.8, status: "ACTIVE" },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  { id: "GPU-4", utilization: 0, memory: 0, load: 0, status: "IDLE" },
23
  ]);
24
 
25
  const [avgLoad, setAvgLoad] = useState(0);
26
  const [jitter, setJitter] = useState(0.45);
27
 
 
28
  useEffect(() => {
29
  setMounted(true);
30
  const interval = setInterval(() => {
 
52
  }, 1500);
53
  return () => clearInterval(interval);
54
  }, []);
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  useEffect(() => {
57
  const total = nodes.reduce((acc, n) => acc + n.utilization, 0);
58
  setAvgLoad(total / nodes.length);
59
  }, [nodes]);
60
 
 
61
  if (!mounted) {
62
  return (
63
  <section className="section-block" id="gpu-cluster" style={{ opacity: 0 }}>
 
66
  </section>
67
  );
68
  }
 
 
 
69
 
70
  return (
71
  <section className="section-block" id="gpu-cluster">
72
  <div className="section-label">03 // COMPUTE RESOURCES</div>
73
  <h2 className="section-title">GPU Compute Clusters</h2>
74
  <p className="section-desc">
 
75
  Real-time telemetry from the underlying inference hardware.
76
  High cluster utilization may introduce latency in the trust calibration loop.
 
 
 
 
77
  </p>
78
 
79
  <div className="cluster-grid">
80
  {nodes.map((node) => (
81
  <div key={node.id} className={`card node-card ${node.status.toLowerCase()}`}>
 
82
  <div className="card-id">{node.id} // NODE-0{node.id.split("-")[1]}</div>
83
 
84
  <div className="node-status-badge">
 
86
  background: node.status === "OVERLOADED" ? "var(--red)" :
87
  node.status === "FAILED" ? "#555" :
88
  node.status === "IDLE" ? "var(--muted)" : "var(--green)"
 
 
 
 
 
 
 
 
 
89
  }} />
90
  {node.status}
91
  </div>
 
96
  <span style={{ color: "var(--cyan)" }}>{Math.round(node.utilization)}%</span>
97
  </div>
98
  <div className="metric-bar-bg">
 
99
  <motion.div
100
  className="metric-bar-fill"
101
  animate={{ width: `${node.utilization}%` }}
 
 
 
 
 
 
102
  style={{ background: node.utilization > 90 ? "var(--red)" : "var(--cyan)" } as any}
103
  />
104
  </div>
 
110
  <span style={{ color: "var(--green)" }}>{Math.round(node.memory)}%</span>
111
  </div>
112
  <div className="metric-bar-bg">
 
113
  <motion.div
114
  className="metric-bar-fill"
115
  animate={{ width: `${node.memory}%` }}
 
 
 
 
 
 
116
  style={{ background: "var(--green)" } as any}
117
  />
118
  </div>
 
136
  <div className="cluster-total-load">
137
  <span className="label">TOTAL CLUSTER LOAD</span>
138
  <div className="load-meter-bg">
 
139
  <motion.div
 
 
 
140
  className="load-meter-fill"
141
  animate={{ width: `${avgLoad}%` }}
142
  style={{ background: avgLoad > 80 ? "var(--red)" : "var(--cyan)", color: avgLoad > 80 ? "var(--red)" : "var(--cyan)" } as any}
ui/app/components/SpecialistNetwork.tsx CHANGED
@@ -17,11 +17,7 @@ export default function SpecialistNetwork({
17
  trustDeltas: Record<string, number>;
18
  activeSpec: string | null;
19
  }) {
20
- <<<<<<< HEAD
21
  const ids = observation?.available_specialists ?? ["S0", "S1", "S2", "S3", "S4"];
22
- =======
23
- const ids = observation?.available_specialists || observation?.available_workers || ["S0", "S1", "S2", "S3", "S4"];
24
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
25
  return (
26
  <div className="net">
27
  <svg className="net-svg" viewBox="0 0 100 100" preserveAspectRatio="xMidYMid meet">
 
17
  trustDeltas: Record<string, number>;
18
  activeSpec: string | null;
19
  }) {
 
20
  const ids = observation?.available_specialists ?? ["S0", "S1", "S2", "S3", "S4"];
 
 
 
21
  return (
22
  <div className="net">
23
  <svg className="net-svg" viewBox="0 0 100 100" preserveAspectRatio="xMidYMid meet">
ui/app/components/TrustTimeline.tsx CHANGED
@@ -10,11 +10,7 @@ export default function TrustTimeline({
10
  observation: Observation | null;
11
  trustDeltas: Record<string, number>;
12
  }) {
13
- <<<<<<< HEAD
14
  const ids = observation?.available_specialists ?? ["S0", "S1", "S2", "S3", "S4"];
15
- =======
16
- const ids = observation?.available_specialists ?? observation?.available_workers ?? ["S0", "S1", "S2", "S3", "S4"];
17
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
18
  return (
19
  <div className="tl">
20
  {ids.map((id) => {
 
10
  observation: Observation | null;
11
  trustDeltas: Record<string, number>;
12
  }) {
 
13
  const ids = observation?.available_specialists ?? ["S0", "S1", "S2", "S3", "S4"];
 
 
 
14
  return (
15
  <div className="tl">
16
  {ids.map((id) => {
ui/app/hooks/useSentinel.ts CHANGED
@@ -9,7 +9,6 @@ import type {
9
 
10
  /* ── helpers ──────────────────────────────────────────── */
11
 
12
- <<<<<<< HEAD
13
  const API_BASE = process.env.NEXT_PUBLIC_API_URL || "";
14
 
15
  function bestSpec(obs: Observation | null): string {
@@ -17,18 +16,6 @@ function bestSpec(obs: Observation | null): string {
17
  return [...obs.available_specialists].sort(
18
  (a, b) => (obs.trust_snapshot[b] ?? 0.5) - (obs.trust_snapshot[a] ?? 0.5),
19
  )[0];
20
- =======
21
- const API_BASE = typeof window !== "undefined"
22
- ? (process.env.NEXT_PUBLIC_API_URL || (window.location.port === "3000" || window.location.port === "3458" ? "http://127.0.0.1:7860" : ""))
23
- : "";
24
-
25
- function bestSpec(obs: Observation | null): string {
26
- if (!obs) return "S0";
27
- const ids = obs.available_specialists || obs.available_workers || [];
28
- return [...ids].sort(
29
- (a, b) => (obs.trust_snapshot[b] ?? 0.5) - (obs.trust_snapshot[a] ?? 0.5),
30
- )[0] || "S0";
31
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
32
  }
33
 
34
  function heuristicMove(obs: Observation | null) {
@@ -42,14 +29,9 @@ function heuristicMove(obs: Observation | null) {
42
 
43
  function randomMove(obs: Observation | null) {
44
  if (!obs) return { action: "delegate" as ActionType, specialist: "S0", trust: 0.5 };
45
- <<<<<<< HEAD
46
  const sp = obs.available_specialists[
47
  Math.floor(Math.random() * obs.available_specialists.length)
48
  ] || "S0";
49
- =======
50
- const ids = obs.available_specialists || obs.available_workers || [];
51
- const sp = ids[Math.floor(Math.random() * ids.length)] || "S0";
52
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
53
  return { action: "delegate" as ActionType, specialist: sp, trust: obs.trust_snapshot[sp] ?? 0.5 };
54
  }
55
 
@@ -140,12 +122,7 @@ export function useSentinel() {
140
  const trustDeltas = useMemo(() => {
141
  if (!observation) return {};
142
  const d: Record<string, number> = {};
143
- <<<<<<< HEAD
144
  for (const id of observation.available_specialists) {
145
- =======
146
- const ids = observation.available_specialists || observation.available_workers || [];
147
- for (const id of ids) {
148
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
149
  d[id] = (observation.trust_snapshot[id] ?? 0.5) - (prevTrust[id] ?? 0.5);
150
  }
151
  return d;
@@ -173,17 +150,10 @@ export function useSentinel() {
173
  const s = nextSeed ?? seed;
174
  setRunning(true);
175
  abortRef.current = false;
176
- <<<<<<< HEAD
177
  const payload = { task_type: t, seed: s };
178
  setLastReq({ method: "POST", path: "/reset", body: payload });
179
  try {
180
  const res = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/reset`, {
181
- =======
182
- const payload = { task_type: t, seed: s, mode: "cluster" };
183
- setLastReq({ method: "POST", path: "/reset", body: payload });
184
- try {
185
- const res = await fetch(`${API_BASE}/reset`, {
186
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
187
  method: "POST",
188
  headers: { "Content-Type": "application/json" },
189
  body: JSON.stringify(payload),
@@ -225,31 +195,17 @@ export function useSentinel() {
225
 
226
  setActiveSpec(specialist);
227
 
228
- <<<<<<< HEAD
229
  const payload = {
230
  session_id: sid,
231
  task_type: obs.task_type,
232
  action_type: action,
233
- =======
234
- const isCluster = active?.info?.environment_mode === "cluster" || sessionId === "cluster";
235
- const mappedAction = (isCluster && action === "delegate") ? "allocate" : action;
236
-
237
- const payload = {
238
- session_id: sid,
239
- task_type: obs.task_type,
240
- action_type: mappedAction,
241
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
242
  specialist_id: specialist,
243
  subtask_response: action === "solve_independently" ? "SELF_SOLVED" : null,
244
  reasoning: `ui-${action}${specialist ? `-${specialist}` : ""}`,
245
  };
246
  setLastReq({ method: "POST", path: `/step?session_id=${sid}`, body: payload });
247
  try {
248
- <<<<<<< HEAD
249
  const res = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/step?session_id=${encodeURIComponent(sid)}`, {
250
- =======
251
- const res = await fetch(`${API_BASE}/step?session_id=${encodeURIComponent(sid)}`, {
252
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
253
  method: "POST",
254
  headers: { "Content-Type": "application/json" },
255
  body: JSON.stringify(payload),
 
9
 
10
  /* ── helpers ──────────────────────────────────────────── */
11
 
 
12
  const API_BASE = process.env.NEXT_PUBLIC_API_URL || "";
13
 
14
  function bestSpec(obs: Observation | null): string {
 
16
  return [...obs.available_specialists].sort(
17
  (a, b) => (obs.trust_snapshot[b] ?? 0.5) - (obs.trust_snapshot[a] ?? 0.5),
18
  )[0];
 
 
 
 
 
 
 
 
 
 
 
 
19
  }
20
 
21
  function heuristicMove(obs: Observation | null) {
 
29
 
30
  function randomMove(obs: Observation | null) {
31
  if (!obs) return { action: "delegate" as ActionType, specialist: "S0", trust: 0.5 };
 
32
  const sp = obs.available_specialists[
33
  Math.floor(Math.random() * obs.available_specialists.length)
34
  ] || "S0";
 
 
 
 
35
  return { action: "delegate" as ActionType, specialist: sp, trust: obs.trust_snapshot[sp] ?? 0.5 };
36
  }
37
 
 
122
  const trustDeltas = useMemo(() => {
123
  if (!observation) return {};
124
  const d: Record<string, number> = {};
 
125
  for (const id of observation.available_specialists) {
 
 
 
 
126
  d[id] = (observation.trust_snapshot[id] ?? 0.5) - (prevTrust[id] ?? 0.5);
127
  }
128
  return d;
 
150
  const s = nextSeed ?? seed;
151
  setRunning(true);
152
  abortRef.current = false;
 
153
  const payload = { task_type: t, seed: s };
154
  setLastReq({ method: "POST", path: "/reset", body: payload });
155
  try {
156
  const res = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/reset`, {
 
 
 
 
 
 
157
  method: "POST",
158
  headers: { "Content-Type": "application/json" },
159
  body: JSON.stringify(payload),
 
195
 
196
  setActiveSpec(specialist);
197
 
 
198
  const payload = {
199
  session_id: sid,
200
  task_type: obs.task_type,
201
  action_type: action,
 
 
 
 
 
 
 
 
 
202
  specialist_id: specialist,
203
  subtask_response: action === "solve_independently" ? "SELF_SOLVED" : null,
204
  reasoning: `ui-${action}${specialist ? `-${specialist}` : ""}`,
205
  };
206
  setLastReq({ method: "POST", path: `/step?session_id=${sid}`, body: payload });
207
  try {
 
208
  const res = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/step?session_id=${encodeURIComponent(sid)}`, {
 
 
 
209
  method: "POST",
210
  headers: { "Content-Type": "application/json" },
211
  body: JSON.stringify(payload),
ui/app/lib/types.ts CHANGED
@@ -14,10 +14,6 @@ export type Observation = {
14
  subtasks_total: number;
15
  subtasks_remaining: number;
16
  available_specialists: string[];
17
- <<<<<<< HEAD
18
- =======
19
- available_workers?: string[];
20
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
21
  trust_snapshot: Record<string, number>;
22
  stakes_level: number;
23
  step_count: number;
@@ -25,10 +21,6 @@ export type Observation = {
25
  last_action_summary: string | null;
26
  last_reward: number;
27
  episode_status: string;
28
- <<<<<<< HEAD
29
- =======
30
- gpu_pool?: any[];
31
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
32
  };
33
 
34
  export type Reward = {
@@ -49,10 +41,6 @@ export type StepResult = {
49
  score: number;
50
  adversarial_detections?: number;
51
  adversarial_poisonings?: number;
52
- <<<<<<< HEAD
53
- =======
54
- environment_mode?: string;
55
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
56
  };
57
  };
58
 
 
14
  subtasks_total: number;
15
  subtasks_remaining: number;
16
  available_specialists: string[];
 
 
 
 
17
  trust_snapshot: Record<string, number>;
18
  stakes_level: number;
19
  step_count: number;
 
21
  last_action_summary: string | null;
22
  last_reward: number;
23
  episode_status: string;
 
 
 
 
24
  };
25
 
26
  export type Reward = {
 
41
  score: number;
42
  adversarial_detections?: number;
43
  adversarial_poisonings?: number;
 
 
 
 
44
  };
45
  };
46
 
ui/app/page.tsx CHANGED
@@ -256,15 +256,7 @@ export default function Page() {
256
  <div className="divider" />
257
 
258
  {/* GPU CLUSTER */}
259
- <<<<<<< HEAD
260
  <GPUClusterPanel />
261
- =======
262
- <GPUClusterPanel
263
- sessionId={s.info?.session_id}
264
- mode={s.info?.environment_mode}
265
- gpuPool={s.observation?.gpu_pool}
266
- />
267
- >>>>>>> a89a58750afb4cf3e8d49f13fe66d7c227911387
268
 
269
  <div className="divider" />
270
 
 
256
  <div className="divider" />
257
 
258
  {/* GPU CLUSTER */}
 
259
  <GPUClusterPanel />
 
 
 
 
 
 
 
260
 
261
  <div className="divider" />
262