Spaces:

XcodeAddy
/

sentinel-env

Running

App Files Files Community

XcodeAddy commited on 22 days ago

Commit

2c0b609

1 Parent(s): cfbcd01

Add SENTINEL rollout and presentation spine

Browse files

Files changed (4) hide show

README.md +8 -0
docs/ROLL_OUT.md +231 -0
docs/diagrams/VISUAL_SYSTEM.md +142 -0
docs/presentation/NARRATIVE_LOCK.md +126 -0

README.md CHANGED Viewed

@@ -14,6 +14,14 @@ Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Hor
 SENTINEL is an OpenEnv-compatible RL environment for one core skill: training an orchestrator to decide who to trust, when to verify, how to recover, and how to finish long multi-agent work when specialist agents are unreliable or adversarial.
 ## Why It Matters
 Modern agent systems fail in the same pattern:

 SENTINEL is an OpenEnv-compatible RL environment for one core skill: training an orchestrator to decide who to trust, when to verify, how to recover, and how to finish long multi-agent work when specialist agents are unreliable or adversarial.
+## Rollout Source Of Truth
+The phased execution plan and presentation assets now live in-repo:
+- [Rollout](docs/ROLL_OUT.md)
+- [Narrative Lock](docs/presentation/NARRATIVE_LOCK.md)
+- [Visual System](docs/diagrams/VISUAL_SYSTEM.md)
 ## Why It Matters
 Modern agent systems fail in the same pattern:

docs/ROLL_OUT.md ADDED Viewed

	@@ -0,0 +1,231 @@

+# SENTINEL Rollout
+This file is the execution spine for the project. The rule is simple:
+1. Finish one phase.
+2. Verify it.
+3. Only then move to the next phase.
+SENTINEL wins if the repo, Space, README, UI, and pitch all tell the same story:
+> Train an orchestrator to decide who to trust, when to verify, and how to recover in long multi-agent tasks when specialists are unreliable or adversarial.
+## Current Status
+| Area | Status | Notes |
+| --- | --- | --- |
+| Environment core | Strong | `reset()`, `step()`, `state()`, rewards, task graph, specialists, trust ledger |
+| OpenEnv / deploy | Strong | Space live, Docker passing, validation passing |
+| UI clarity | Improving | Trust Mission Control is live, but still needs full judge-demo mode |
+| Presentation assets | Partial | Story exists, but diagrams and finale pack need stronger structure |
+| Training evidence | Partial | Baselines are real; final onsite GRPO curve still missing |
+| Submission completeness | Partial | Mini-blog/video and final finale package still needed |
+## What We Borrow From MiroFish
+We borrow **presentation discipline**, not product scope.
+Use these MiroFish-style strengths:
+- one sharp promise at the top
+- visible workflow
+- screenshot and diagram density
+- live demo-first presentation
+- clean quick-start and deployment instructions
+Do **not** copy these patterns into SENTINEL:
+- giant "predict anything" scope
+- too many use cases
+- vague platform framing
+- vision language that is larger than the actual judged artifact
+## Phase Rules
+- Phase 1 must lock the narrative.
+- Phase 2 must lock the diagram system.
+- Phase 3 must make the UI explain the backend and the story.
+- Phase 4 must make learning evidence obvious.
+- Phase 5 must make the submission complete and reproducible.
+- Phase 6 must make the final pitch unforgettable.
+Do not skip a verification gate just because the feature "looks done."
+---
+## Phase 1 - Narrative Lock
+**Goal**
+Create one judge-safe project story and use it everywhere.
+**Outputs**
+- [Narrative Lock](./presentation/NARRATIVE_LOCK.md)
+- final one-line thesis
+- final hook
+- final problem framing
+- final before/after claim
+- final "what not to say" guardrails
+**Done means**
+- README, UI, demo script, and pitch all use the same project sentence
+- no outdated numbers or mismatched claims remain in primary docs
+- the problem statement is clearly software-first, RL-first, and OpenEnv-first
+**Verification**
+- README top section matches the narrative lock
+- UI top section uses the same thesis
+- team can explain SENTINEL in 20 seconds and 2 minutes without changing the core message
+**Status**
+`In progress`
+---
+## Phase 2 - Visual System Pack
+**Goal**
+Turn scattered diagrams into one visual language.
+**Outputs**
+- [Visual System](./diagrams/VISUAL_SYSTEM.md)
+- architecture diagram
+- episode lifecycle diagram
+- trust / reward dataflow diagram
+- before / after failure chain
+- theme fit diagram
+- training loop diagram
+**Done means**
+- every diagram uses the same naming and system boundaries
+- no diagram contradicts the actual code
+- diagrams can be embedded in README, blog, pitch, and UI
+**Verification**
+- `app.py`, `environment.py`, `specialists.py`, `trust_ledger.py`, `graders.py`, `task_graph.py`, and `inference.py` are all represented correctly
+- before/after flow uses real baseline numbers, not aspirational placeholders
+**Status**
+`In progress`
+---
+## Phase 3 - Productized Demo UI
+**Goal**
+Make the frontend explain the backend to judges and first-time users.
+**Outputs**
+- `Overview` mode
+- `Playground` mode
+- `Judge Demo` mode
+- raw request/response visibility
+- guided walkthrough of one episode
+- profile swap demo path
+**Done means**
+- a first-time viewer can answer:
+  - what is SENTINEL?
+  - what does the agent observe?
+  - what action did the UI send?
+  - what did the backend return?
+  - why does trust change?
+  - why is this hard?
+**Verification**
+- local `/`, `/reset`, `/step`, `/state`, and `/assets/baseline_comparison.png` all behave correctly
+- live Space reflects the same experience
+- no section feels like internal tooling only
+**Status**
+`Pending`
+---
+## Phase 4 - Learning Evidence
+**Goal**
+Make reward improvement impossible to miss.
+**Outputs**
+- random vs heuristic vs oracle-lite comparison
+- visible completion, detection, calibration, efficiency metrics
+- onsite GRPO / Unsloth reward curve
+- trained vs untrained comparison block
+**Done means**
+- judges can see measurable improvement in one screen and one README section
+- there is a visible path from baseline -> better policy -> trained model
+**Verification**
+- `training/evaluate.py` outputs are committed and linked
+- onsite curve is committed once available
+- numbers shown in UI and README match evaluation artifacts
+**Status**
+`Pending`
+---
+## Phase 5 - Submission Pack
+**Goal**
+Make the project submission-complete.
+**Outputs**
+- final README with all links
+- HF Space link
+- Colab / training notebook link
+- blog or video link
+- screenshots and diagram links
+- reproduction commands
+**Done means**
+- a judge can clone, run, inspect, and understand the project without asking for missing context
+**Verification**
+- README links are live
+- Space is live
+- `openenv validate . --json` passes
+- Docker build passes
+**Status**
+`Pending`
+---
+## Phase 6 - Finale Pack
+**Goal**
+Package the repo for the room, not just for the validator.
+**Outputs**
+- 3-minute script
+- 5 likely judge questions + answers
+- backup screenshots
+- fallback demo sequence
+- one-click "killer moment" path
+**Done means**
+- the pitch works even if the live environment is slow
+- the trained-vs-baseline story is memorable
+- the profile swap moment is rehearsed
+**Verification**
+- demo path can be run without improvising architecture details
+- every claim can be grounded in repo assets
+**Status**
+`Pending`
+---
+## Execution Order
+```text
+Phase 1 -> Phase 2 -> Phase 3 -> Phase 4 -> Phase 5 -> Phase 6
+```
+## Next Immediate Build Target
+Phase 1 and Phase 2 are the current active work.
+Once both are fully stable in-repo, Phase 3 starts on top of them.

docs/diagrams/VISUAL_SYSTEM.md ADDED Viewed

	@@ -0,0 +1,142 @@

+# SENTINEL Visual System
+This file is the diagram source of truth. Every diagram used in README, UI, blog, or slides should be derived from here.
+## Diagram Inventory
+| Diagram | Purpose | Status |
+| --- | --- | --- |
+| System stack | show the code architecture | ready |
+| Episode lifecycle | explain `reset()` to terminal reward | ready |
+| Trust and reward flow | show how state turns into learning signal | ready |
+| Before / after | show why SENTINEL matters | ready |
+| Theme fit | map the project to the hackathon | ready |
+| Training loop | show OpenEnv -> TRL / Unsloth pipeline | ready |
+---
+## 1. System Stack
+```mermaid
+flowchart TD
+  A["HTTP client / UI / inference.py"] --> B["app.py<br/>FastAPI on port 7860"]
+  B --> C["SentinelEnv<br/>environment.py"]
+  B --> D["_sessions<br/>session_id -> SentinelEnv"]
+  C --> E["TaskGraph<br/>task_graph.py"]
+  C --> F["TrustLedger<br/>trust_ledger.py"]
+  C --> G["SpecialistPool<br/>specialists.py"]
+  C --> H["RewardEngine<br/>graders.py"]
+  C --> I["Scenario dataset<br/>scenarios.py"]
+  C --> J["Typed models<br/>models.py"]
+  B --> K["openenv.yaml"]
+  B --> L["static/index.html"]
+```
+---
+## 2. Episode Lifecycle
+```mermaid
+flowchart TD
+  A["reset(task_type, seed)"] --> B["sample scenario"]
+  B --> C["reshuffle hidden specialist profiles"]
+  C --> D["set trust priors to 0.50"]
+  D --> E["build task graph"]
+  E --> F["return first observation"]
+  F --> G["orchestrator chooses action"]
+  G --> H["delegate / verify / self solve / skip"]
+  H --> I["specialist or self execution"]
+  I --> J["record outcome in TaskGraph"]
+  J --> K["update TrustLedger"]
+  K --> L["compute step reward"]
+  L --> M{"done?"}
+  M -- "no" --> N["return next observation"]
+  N --> G
+  M -- "yes" --> O["compute terminal reward"]
+  O --> P["return done=True with final info"]
+```
+---
+## 3. Trust And Reward Flow
+```mermaid
+flowchart LR
+  A["Observation<br/>subtask, stakes, trust snapshot"] --> B["Action choice"]
+  B --> C["Specialist result<br/>outcome, confidence, adversarial flag, step_cost"]
+  C --> D["TaskGraph update"]
+  C --> E["TrustLedger Bayesian update"]
+  D --> F["completion, detections, poisonings"]
+  E --> G["calibration state"]
+  F --> H["RewardEngine"]
+  G --> H
+  H --> I["step reward"]
+  H --> J["terminal reward"]
+```
+---
+## 4. Before / After
+```mermaid
+flowchart LR
+  subgraph BEFORE["Before SENTINEL"]
+    A1["Uniform trust"] --> A2["Blind delegation"]
+    A2 --> A3["Poison accepted at high stakes"]
+    A3 --> A4["Downstream subtasks inherit bad state"]
+    A4 --> A5["Mission drifts or fails"]
+  end
+  subgraph AFTER["After SENTINEL"]
+    B1["Behavior updates trust"] --> B2["Low-trust high-stakes node detected"]
+    B2 --> B3["Verify instead of delegate"]
+    B3 --> B4["Poison blocked before cascade"]
+    B4 --> B5["Mission completes cleanly"]
+  end
+```
+---
+## 5. Theme Fit
+```mermaid
+flowchart TD
+  S["SENTINEL"] --> T1["Theme 1<br/>multi-agent interaction"]
+  S --> T2["Theme 2<br/>long-horizon planning"]
+  S --> T4["Theme 4<br/>self-improvement"]
+  S --> T5["Theme 5<br/>wild card"]
+  T1 --> B1["orchestrator + five specialists<br/>partial observability<br/>adversarial dynamics"]
+  T2 --> B2["task graph<br/>step budget pressure<br/>delayed terminal reward"]
+  T4 --> B3["profile reshuffle<br/>auto-curriculum<br/>no memorization"]
+  T5 --> B4["real production weakness<br/>blind trust in agent pipelines"]
+```
+---
+## 6. Training Loop
+```mermaid
+flowchart LR
+  A["Prompt / observation"] --> B["Model rollout"]
+  B --> C["Action text or structured action"]
+  C --> D["SENTINEL environment"]
+  D --> E["Reward + next observation"]
+  E --> F["TRL / GRPO trainer"]
+  F --> G["updated policy"]
+  G --> B
+  H["training/evaluate.py"] --> I["random / heuristic / oracle-lite"]
+  I --> J["evaluation_results.json"]
+  I --> K["baseline_comparison.png"]
+```
+---
+## Use Rules
+1. Do not invent new component names in slide decks that do not exist in code.
+2. Use `SentinelEnv`, `TrustLedger`, `SpecialistPool`, `TaskGraph`, `RewardEngine` consistently.
+3. Use real baseline numbers in public before/after materials.
+4. Export polished PNG versions from these mermaid sources later, but keep this file as the editable truth.

docs/presentation/NARRATIVE_LOCK.md ADDED Viewed

	@@ -0,0 +1,126 @@

+# SENTINEL Narrative Lock
+This file defines the one story the whole project must tell.
+## One Sentence
+SENTINEL is an OpenEnv RL environment that trains an orchestrator to decide who to trust, when to verify, and how to recover in long multi-agent tasks when specialist agents are unreliable or adversarial.
+## 20-Second Version
+Multi-agent systems break because they trust sub-agents too easily. SENTINEL turns that failure into a trainable environment: the orchestrator must learn trust calibration from behavioral evidence alone, under long-horizon pressure and adversarial specialists.
+## 2-Minute Version
+Every multi-agent framework today has the same hidden weakness: one specialist can be confidently wrong, and the orchestrator will often delegate blindly, accept the result, and let the failure cascade downstream. SENTINEL is an OpenEnv RL environment built to train exactly against that weakness.
+The orchestrator never sees specialist internals. It only sees behavior: outcomes, stakes, history, and trust scores. Five public specialist slots are visible, but the hidden profiles reshuffle every episode, so the agent cannot memorize identities. It must learn the skill of trust calibration.
+The environment rewards mission completion, adversarial detection, calibration quality, and efficiency. That makes the project more than a simulation. It is a training environment with measurable improvement: random routing, trust-aware heuristic routing, and eventually trained routing.
+## Problem Statement
+Train an orchestrator to complete long multi-agent tasks under partial observability by learning:
+- which specialist to trust
+- when a risky result should be verified
+- when to self-solve instead of delegating
+- how to recover before poisoned state cascades through the mission
+## What We Are Building
+We are building:
+- a deployable OpenEnv environment
+- a reward design for trust calibration
+- a live judge-demo UI
+- a training and evaluation pipeline
+- a final before/after demo showing learned behavioral change
+We are **not** building:
+- a general chatbot
+- a coding assistant product
+- a replay of incident triage
+- a giant multi-domain prediction platform
+- a vague multi-agent "framework"
+## Why Judges Should Care
+This is not a toy coordination task. It targets a real production weakness in modern agent systems:
+> sub-agents are often assumed trustworthy until a human catches the damage.
+SENTINEL makes that weakness trainable.
+## Before / After Claim
+**Before SENTINEL**
+- trust is static or heuristic
+- bad high-confidence outputs slip through
+- failures cascade across downstream steps
+- the orchestrator cannot explain why the mission drifted
+**After SENTINEL**
+- trust changes from observed behavior
+- high-stakes, low-trust outputs are verified
+- adversarial attempts are caught before cascade
+- the orchestrator learns skill, not memorized role identity
+## Non-Negotiable Claims
+These claims must stay consistent in README, UI, demo, and blog:
+1. SENTINEL is about **trust calibration**
+2. the orchestrator is the **trainable policy**
+3. specialists are **scripted on purpose** for stable reward
+4. the reshuffle mechanic proves **skill over memorization**
+5. the reward combines **completion, detection, calibration, efficiency**
+## What Not To Say
+Do not describe SENTINEL as:
+- "predict anything"
+- "a full digital twin of the world"
+- "an all-in-one multi-agent platform"
+- "a software assistant for every use case"
+- "a space, quantum, or general science simulator"
+Those make the project sound bigger but less judgeable.
+## Judge-Facing Angle By Criterion
+### Environment Innovation
+The novelty is not just "multi-agent."
+The novelty is **training trust calibration under shuffled adversarial identity**.
+### Storytelling
+The story is simple:
+- blind trust fails
+- behavioral evidence updates trust
+- verification blocks poison
+- profile swap proves generalization
+### Improvement In Rewards
+The visual proof is:
+- random
+- heuristic
+- oracle-lite
+- trained model onsite
+### Reward / Training Pipeline
+The important line is:
+> the reward does not praise vibes; it scores completion, detection, calibration, and efficiency.
+## 3-Minute Pitch Spine
+### Minute 1
+Problem: multi-agent systems trust sub-agents too easily.
+### Minute 2
+Environment: orchestrator, five specialist slots, hidden shuffled profiles, trust ledger, task graph, reward engine.
+### Minute 3
+Evidence: baseline gap, live trust changes, profile swap moment.