Spaces:

XcodeAddy
/

sentinel-env

Running

App Files Files Community

XcodeAddy commited on 22 days ago

Commit

136ea72

1 Parent(s): 325aa05

Initial Set-up

Browse files

Files changed (23) hide show

.DS_Store +0 -0
.gitignore +11 -0
Dockerfile +5 -1
README.md +190 -0
app.py +14 -1
comms_bus.py +64 -0
environment.py +29 -8
inference.py +19 -23
openenv.yaml +2 -2
pyproject.toml +30 -0
requirements.txt +5 -5
scenarios.py +3 -2
server/__init__.py +1 -0
server/app.py +16 -0
specialists.py +18 -4
task_graph.py +33 -10
tests/test_environment.py +55 -0
tests/test_graders.py +46 -0
tests/test_specialists.py +35 -0
training/colab_notebook.ipynb +54 -0
training/evaluate.py +153 -0
training/train.py +118 -0
uv.lock +3 -0

.DS_Store DELETED Viewed

Binary file (6.15 kB)

.gitignore ADDED Viewed

	@@ -0,0 +1,11 @@

+.DS_Store
+__pycache__/
+*.py[cod]
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.venv/
+outputs/
+.env
+.env.*
+!.env.example

Dockerfile CHANGED Viewed

@@ -14,9 +14,13 @@ COPY graders.py .
 COPY specialists.py .
 COPY trust_ledger.py .
 COPY task_graph.py .
 COPY scenarios.py .
 COPY openenv.yaml .
 COPY inference.py .
 # Create outputs directory for baseline scores
 RUN mkdir -p outputs
@@ -29,4 +33,4 @@ HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
     CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:7860/health')" || exit 1
 # Start server
-CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1"]

 COPY specialists.py .
 COPY trust_ledger.py .
 COPY task_graph.py .
+COPY comms_bus.py .
 COPY scenarios.py .
 COPY openenv.yaml .
 COPY inference.py .
+COPY README.md .
+COPY pyproject.toml .
+COPY server ./server
 # Create outputs directory for baseline scores
 RUN mkdir -p outputs
     CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:7860/health')" || exit 1
 # Start server
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1"]

README.md CHANGED Viewed

	@@ -0,0 +1,190 @@

+# SENTINEL
+Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Horizon Tasks.
+SENTINEL is an OpenEnv-compatible RL environment for one core skill: training an orchestrator to decide who to trust, when to verify, how to recover, and how to finish long multi-agent work when specialist agents are unreliable or adversarial.
+## Why It Matters
+Modern agent systems fail in the same pattern:
+1. A long task is decomposed into many steps.
+2. The orchestrator delegates to sub-agents or tools.
+3. One specialist returns a confident but wrong result.
+4. The system trusts it, builds on it, and drifts into failure.
+SENTINEL turns that failure mode into a trainable environment. The model only sees behavior: returned outcomes, confidence, stakes, history, and trust scores. It never sees hidden specialist identities.
+## Environment Shape
+- API: `reset()`, `step(action)`, `state()`
+- Runtime: FastAPI on port `7860`
+- Tasks: `task1`, `task2`, `task3`
+- Specialists: 5 scripted FSM agents with shuffled hidden profiles
+- Rewards: per-step reward plus terminal score, normalized to `0.0-1.0`
+- Dataset: 120 abstract multi-agent scenarios
+## Specialist Behaviors
+| Public Slot | Hidden Behavior |
+| --- | --- |
+| S0-S4 | Public ids are shuffled every episode |
+Hidden profiles:
+- `AccurateSlow`: 90 percent accurate, costs extra steps.
+- `OverconfidentFast`: quick and confident, wrong 40 percent of the time.
+- `DomainBound`: strong on analysis/verification, weak elsewhere.
+- `Adversarial`: benign at low stakes, poisons high-stakes steps.
+- `Degrading`: strong early, gets worse with reuse.
+## Tasks
+| Task | Difficulty | Goal |
+| --- | --- | --- |
+| `task1` | Easy | Learn basic delegation over a 10-subtask chain. |
+| `task2` | Medium | Complete a 15-subtask DAG and build calibrated trust. |
+| `task3` | Hard | Complete a 20-subtask adversarial mission under high-stakes poison attempts. |
+## Reward Model
+Rewards are deterministic and boundary-exclusive per step: `(0.01, 0.99)`.
+Task 3 terminal score:
+```text
+0.35 * completion_rate
++ 0.30 * adversarial_detection_rate
++ 0.25 * trust_calibration
++ 0.10 * efficiency
+```
+The episode `score` exposed in `info` and inference logs is normalized to `0.0-1.0`.
+## API
+```bash
+curl http://localhost:7860/health
+curl http://localhost:7860/
+curl http://localhost:7860/metadata
+curl http://localhost:7860/tasks
+curl http://localhost:7860/schema
+```
+Start an episode:
+```bash
+curl -X POST http://localhost:7860/reset \
+  -H "Content-Type: application/json" \
+  -d '{"task_type":"task3","seed":42}'
+```
+Step:
+```bash
+curl -X POST "http://localhost:7860/step?session_id=<SESSION_ID>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "session_id":"<SESSION_ID>",
+    "task_type":"task3",
+    "action_type":"delegate",
+    "specialist_id":"S2",
+    "reasoning":"S2 has the best observed trust score"
+  }'
+```
+## Project Structure
+```text
+sentinel-env/
+|-- app.py
+|-- environment.py
+|-- models.py
+|-- graders.py
+|-- specialists.py
+|-- trust_ledger.py
+|-- task_graph.py
+|-- comms_bus.py
+|-- scenarios.py
+|-- inference.py
+|-- openenv.yaml
+|-- Dockerfile
+|-- requirements.txt
+|-- training/
+|   |-- train.py
+|   |-- evaluate.py
+|   `-- colab_notebook.ipynb
+`-- tests/
+    |-- test_environment.py
+    |-- test_graders.py
+    `-- test_specialists.py
+```
+## Local Setup
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+python -m pip install --upgrade pip
+pip install -r requirements.txt
+pip install pytest
+```
+Run checks:
+```bash
+python -m py_compile app.py environment.py models.py graders.py specialists.py trust_ledger.py task_graph.py scenarios.py inference.py
+python -m pytest -q
+python inference.py
+python training/evaluate.py --episodes 20 --task task3
+```
+Run the server:
+```bash
+uvicorn app:app --host 0.0.0.0 --port 7860
+```
+Validate with OpenEnv:
+```bash
+pip install openenv-core==0.2.3
+openenv validate . --json
+```
+Docker:
+```bash
+docker build -t sentinel-env .
+docker run -p 7860:7860 sentinel-env
+```
+## Baselines
+`inference.py` runs 30 deterministic heuristic episodes and emits only strict hackathon logs:
+```text
+[START] task=SCN-TASK3-001 env=sentinel-env model=heuristic-baseline
+[STEP] step=1 action=delegate:S0 reward=0.99 done=false error=null
+[END] success=true steps=20 score=0.812 rewards=...
+```
+`training/evaluate.py` compares:
+- `random`
+- `heuristic`
+- `oracle_lite`
+The evaluator writes `outputs/evaluation_results.json` for demo charts.
+## Hackathon Alignment
+- Theme 1: multi-agent interaction, partial observability, adversarial specialist, trust calibration.
+- Theme 2: long-horizon task graphs with delayed terminal reward and failure recovery.
+- Theme 3.1: professional agent orchestration workflow with API-style actions.
+- Theme 4: profile shuffle creates a self-resetting curriculum.
+- Theme 5: targets a real AI systems failure: blind trust inside agent pipelines.
+Winning demo line:
+> Agents fail because they trust blindly. SENTINEL trains skepticism, recovery, and oversight.

app.py CHANGED Viewed

@@ -59,6 +59,19 @@ def health():
     return {"status": "ok", "environment": "sentinel-env", "version": "1.0.0"}
 @app.get("/metadata")
 def metadata():
     summary = scenario_summary()
@@ -203,4 +216,4 @@ def mcp(body: dict[str, Any]):
 if __name__ == "__main__":
     import uvicorn
     port = int(os.environ.get("PORT", 7860))
-    uvicorn.run("app:app", host="0.0.0.0", port=port, reload=False)

     return {"status": "ok", "environment": "sentinel-env", "version": "1.0.0"}
+@app.get("/")
+def root():
+    return {
+        "name": "sentinel-env",
+        "status": "ok",
+        "summary": (
+            "SENTINEL trains an orchestrator to calibrate trust, verify risky "
+            "outputs, recover from failures, and finish long multi-agent tasks."
+        ),
+        "routes": ["/health", "/metadata", "/tasks", "/schema", "/grader", "/reset", "/step", "/state"],
+    }
 @app.get("/metadata")
 def metadata():
     summary = scenario_summary()
 if __name__ == "__main__":
     import uvicorn
     port = int(os.environ.get("PORT", 7860))
+    uvicorn.run("app:app", host="0.0.0.0", port=port, reload=False)

comms_bus.py CHANGED Viewed

	@@ -0,0 +1,64 @@

+from __future__ import annotations
+from dataclasses import dataclass, field
+from time import time
+from typing import Any
+@dataclass(frozen=True)
+class CommsMessage:
+    sender: str
+    receiver: str
+    payload: dict[str, Any]
+    timestamp: float = field(default_factory=time)
+class CommsBus:
+    """
+    Lightweight message log for partial-observability experiments.
+    The environment keeps hidden specialist metadata internally, while the
+    orchestrator-facing view only exposes the public response, confidence, and
+    outcome summary. This makes the trust problem behavioral instead of identity
+    based.
+    """
+    def __init__(self, partial_observability: bool = True) -> None:
+        self.partial_observability = partial_observability
+        self._messages: list[CommsMessage] = []
+    def reset(self) -> None:
+        self._messages.clear()
+    def route(self, sender: str, receiver: str, payload: dict[str, Any]) -> dict[str, Any]:
+        visible_payload = self._filter_payload(payload)
+        self._messages.append(
+            CommsMessage(sender=sender, receiver=receiver, payload=visible_payload)
+        )
+        return visible_payload
+    def history(self, receiver: str | None = None) -> list[dict[str, Any]]:
+        messages = self._messages
+        if receiver is not None:
+            messages = [msg for msg in messages if msg.receiver == receiver]
+        return [
+            {
+                "sender": msg.sender,
+                "receiver": msg.receiver,
+                "payload": dict(msg.payload),
+                "timestamp": round(msg.timestamp, 3),
+            }
+            for msg in messages
+        ]
+    def _filter_payload(self, payload: dict[str, Any]) -> dict[str, Any]:
+        if not self.partial_observability:
+            return dict(payload)
+        hidden_keys = {
+            "internal_id",
+            "internal_profile",
+            "ground_truth_reliability",
+            "adversarial_slot",
+        }
+        return {key: value for key, value in payload.items() if key not in hidden_keys}

environment.py CHANGED Viewed

@@ -60,6 +60,7 @@ class SentinelEnv:
         self.step_count: int = 0
         self.max_steps: int = 0
         self.total_reward: float = 0.0
         self.last_reward: float = 0.0
         self.done: bool = False
         self.episode_status: str = "active"
@@ -96,6 +97,7 @@ class SentinelEnv:
         self.step_count       = 0
         self.max_steps        = MAX_STEPS[scenario["task_type"]]
         self.total_reward     = 0.0
         self.last_reward      = 0.0
         self.done             = False
         self.episode_status   = "active"
@@ -149,7 +151,7 @@ class SentinelEnv:
         subtask  = node.subtask
         stakes   = subtask["stakes"]
-        self.step_count += 1
         # --- Execute specialist or self-solve ---
         if action_type == "skip":
@@ -160,8 +162,8 @@ class SentinelEnv:
         elif action_type == "solve_independently":
             # Agent solves itself — always correct (no specialist involved)
-            # But costs 2 steps (enforced via max_steps budget pressure)
-            self.step_count += 1   # extra step cost
             outcome         = 1.0
             was_adversarial = False
             self._graph.record_outcome(subtask["id"], outcome, "SELF")
@@ -170,7 +172,7 @@ class SentinelEnv:
         elif action_type == "verify":
             # First get result, then verify (costs +1 step)
             result = self._pool.execute(specialist_id, subtask["description"], stakes, self._rng)
-            self.step_count += 1   # verification step cost
             outcome         = result.outcome if not result.is_adversarial else 0.0
             was_adversarial  = result.is_adversarial
             # Verification means agent caught adversarial — treat as detection
@@ -182,12 +184,15 @@ class SentinelEnv:
         else:  # delegate
             result          = self._pool.execute(specialist_id, subtask["description"], stakes, self._rng)
             was_adversarial  = result.is_adversarial
             outcome         = 0.0 if was_adversarial else result.outcome
             self._graph.record_outcome(subtask["id"], outcome, specialist_id, was_adversarial)
             self._ledger.update(specialist_id, result.outcome, stakes)
             self.last_action_summary = f"Delegated to {specialist_id} on {subtask['id']}"
         # --- Grade this step ---
         reward_value, reason, breakdown = self._grade_step(
             task_type, action_type, specialist_id, outcome,
@@ -196,6 +201,7 @@ class SentinelEnv:
         self.last_reward   = reward_value
         self.total_reward += reward_value
         # --- Check episode end ---
         all_done    = self._graph.is_done()
@@ -226,6 +232,7 @@ class SentinelEnv:
             "step_count":             self.step_count,
             "max_steps":              self.max_steps,
             "total_reward":           round(self.total_reward, 4),
             "done":                   self.done,
             "scenario_id":            self.current_scenario["scenario_id"],
             "task_type":              self.current_scenario["task_type"],
@@ -301,11 +308,11 @@ class SentinelEnv:
                 terminal_breakdown = {"completion_rate": round(completion, 3)}
         elif task_type == "task2":
             terminal_value, terminal_reason, terminal_breakdown = grade_task2_terminal(
-                self._graph, self._ledger, _GROUND_TRUTH_RELIABILITY
             )
         else:
             terminal_value, terminal_reason, terminal_breakdown = grade_task3_terminal(
-                self._graph, self._ledger, _GROUND_TRUTH_RELIABILITY,
                 self.step_count, self.max_steps,
             )
@@ -315,6 +322,7 @@ class SentinelEnv:
         self.last_reward    = terminal_value
         self.total_reward  += terminal_value
         self.done           = True
         self.episode_status = "failed" if forced_end else "completed"
@@ -337,6 +345,9 @@ class SentinelEnv:
         extra_info: dict | None = None,
     ) -> dict:
         node = self._graph.current_node() if self._graph and not done else None
         obs = {
             "session_id":            self.session_id,
@@ -345,7 +356,7 @@ class SentinelEnv:
             "difficulty":            self._difficulty(),
             "task_description":      self.current_scenario["description"] if self.current_scenario else "",
             "current_subtask":       node.subtask["description"] if node else "All subtasks complete.",
-            "subtask_index":         node.subtask["id"] if node else "DONE",
             "subtasks_total":        self._graph.subtasks_total() if self._graph else 0,
             "subtasks_remaining":    self._graph.subtasks_remaining() if self._graph else 0,
             "available_specialists": self._pool.available_ids(),
@@ -370,6 +381,7 @@ class SentinelEnv:
             "step_count":   self.step_count,
             "max_steps":    self.max_steps,
             "total_reward": round(self.total_reward, 4),
         }
         if extra_info:
             info.update(extra_info)
@@ -379,4 +391,13 @@ class SentinelEnv:
     def _difficulty(self) -> str:
         return {"task1": "easy", "task2": "medium", "task3": "hard"}.get(
             self.current_scenario["task_type"] if self.current_scenario else "task3", "hard"
-        )

         self.step_count: int = 0
         self.max_steps: int = 0
         self.total_reward: float = 0.0
+        self.reward_events: int = 0
         self.last_reward: float = 0.0
         self.done: bool = False
         self.episode_status: str = "active"
         self.step_count       = 0
         self.max_steps        = MAX_STEPS[scenario["task_type"]]
         self.total_reward     = 0.0
+        self.reward_events    = 0
         self.last_reward      = 0.0
         self.done             = False
         self.episode_status   = "active"
         subtask  = node.subtask
         stakes   = subtask["stakes"]
+        step_cost = 1
         # --- Execute specialist or self-solve ---
         if action_type == "skip":
         elif action_type == "solve_independently":
             # Agent solves itself — always correct (no specialist involved)
+            # But costs 2 steps (enforced via max_steps budget pressure).
+            step_cost       = 2
             outcome         = 1.0
             was_adversarial = False
             self._graph.record_outcome(subtask["id"], outcome, "SELF")
         elif action_type == "verify":
             # First get result, then verify (costs +1 step)
             result = self._pool.execute(specialist_id, subtask["description"], stakes, self._rng)
+            step_cost       = int(result.metadata.get("step_cost", 1)) + 1
             outcome         = result.outcome if not result.is_adversarial else 0.0
             was_adversarial  = result.is_adversarial
             # Verification means agent caught adversarial — treat as detection
         else:  # delegate
             result          = self._pool.execute(specialist_id, subtask["description"], stakes, self._rng)
+            step_cost       = int(result.metadata.get("step_cost", 1))
             was_adversarial  = result.is_adversarial
             outcome         = 0.0 if was_adversarial else result.outcome
             self._graph.record_outcome(subtask["id"], outcome, specialist_id, was_adversarial)
             self._ledger.update(specialist_id, result.outcome, stakes)
             self.last_action_summary = f"Delegated to {specialist_id} on {subtask['id']}"
+        self.step_count += max(1, step_cost)
         # --- Grade this step ---
         reward_value, reason, breakdown = self._grade_step(
             task_type, action_type, specialist_id, outcome,
         self.last_reward   = reward_value
         self.total_reward += reward_value
+        self.reward_events += 1
         # --- Check episode end ---
         all_done    = self._graph.is_done()
             "step_count":             self.step_count,
             "max_steps":              self.max_steps,
             "total_reward":           round(self.total_reward, 4),
+            "score":                  round(self.normalized_score(), 4),
             "done":                   self.done,
             "scenario_id":            self.current_scenario["scenario_id"],
             "task_type":              self.current_scenario["task_type"],
                 terminal_breakdown = {"completion_rate": round(completion, 3)}
         elif task_type == "task2":
             terminal_value, terminal_reason, terminal_breakdown = grade_task2_terminal(
+                self._graph, self._ledger, self._public_ground_truth_reliability()
             )
         else:
             terminal_value, terminal_reason, terminal_breakdown = grade_task3_terminal(
+                self._graph, self._ledger, self._public_ground_truth_reliability(),
                 self.step_count, self.max_steps,
             )
         self.last_reward    = terminal_value
         self.total_reward  += terminal_value
+        self.reward_events += 1
         self.done           = True
         self.episode_status = "failed" if forced_end else "completed"
         extra_info: dict | None = None,
     ) -> dict:
         node = self._graph.current_node() if self._graph and not done else None
+        subtask_index = self._graph.node_index(node.subtask["id"]) if node else (
+            self._graph.subtasks_total() if self._graph else 0
+        )
         obs = {
             "session_id":            self.session_id,
             "difficulty":            self._difficulty(),
             "task_description":      self.current_scenario["description"] if self.current_scenario else "",
             "current_subtask":       node.subtask["description"] if node else "All subtasks complete.",
+            "subtask_index":         subtask_index,
             "subtasks_total":        self._graph.subtasks_total() if self._graph else 0,
             "subtasks_remaining":    self._graph.subtasks_remaining() if self._graph else 0,
             "available_specialists": self._pool.available_ids(),
             "step_count":   self.step_count,
             "max_steps":    self.max_steps,
             "total_reward": round(self.total_reward, 4),
+            "score":        round(self.normalized_score(), 4),
         }
         if extra_info:
             info.update(extra_info)
     def _difficulty(self) -> str:
         return {"task1": "easy", "task2": "medium", "task3": "hard"}.get(
             self.current_scenario["task_type"] if self.current_scenario else "task3", "hard"
+        )
+    def normalized_score(self) -> float:
+        """Episode score normalized to 0.0-1.0 for judging logs."""
+        if self.reward_events <= 0:
+            return 0.0
+        return max(0.0, min(1.0, self.total_reward / self.reward_events))
+    def _public_ground_truth_reliability(self) -> dict[str, float]:
+        return self._pool.public_ground_truth_reliability(_GROUND_TRUTH_RELIABILITY)

inference.py CHANGED Viewed

@@ -24,8 +24,6 @@ from __future__ import annotations
 import json
 import os
-import sys
-import time
 from pathlib import Path
 # ---------------------------------------------------------------------------
@@ -54,13 +52,14 @@ class EnvClient:
             self._env = SentinelEnv()
         self.session_id: str = ""
-    def reset(self, task_type: str, seed: int | None = None) -> dict:
         if USE_REMOTE:
-            r = self._client.post("/reset", json={"task_type": task_type, "seed": seed})
             r.raise_for_status()
             result = r.json()
         else:
-            result = self._env.reset(task_type=task_type, seed=seed)
         self.session_id = result["info"]["session_id"]
         return result
@@ -126,13 +125,14 @@ def run_episode(
     scenario_id: str,
     seed: int,
 ) -> dict:
-    result     = client.reset(task_type=task_type, seed=seed)
     session_id = client.session_id
     print(f"[START] task={scenario_id} env=sentinel-env model=heuristic-baseline")
     step_num    = 0
-    total_score = 0.0
     while True:
         obs    = result["observation"]
@@ -142,13 +142,14 @@ def run_episode(
         reward    = result["reward"]["value"]
         done      = result["done"]
         step_num += 1
-        total_score = result["info"]["total_reward"]
         action_str = f"{action['action_type']}:{action.get('specialist_id','SELF')}"
         print(
             f"[STEP] step={step_num} "
             f"action={action_str} "
-            f"reward={reward:.4f} "
             f"done={str(done).lower()} "
             f"error=null"
         )
@@ -162,19 +163,21 @@ def run_episode(
     detections  = info.get("adversarial_detections", 0)
     poisonings  = info.get("adversarial_poisonings", 0)
     trust_snap  = info.get("trust_snapshot", {})
     print(
         f"[END] success=true "
         f"steps={step_num} "
-        f"score={total_score:.4f} "
-        f"rewards={total_score:.4f}"
     )
     return {
         "scenario_id":            scenario_id,
         "task_type":              task_type,
         "steps":                  step_num,
-        "total_score":            round(total_score, 4),
         "completion_rate":        round(completion, 4),
         "adversarial_detections": detections,
         "adversarial_poisonings": poisonings,
@@ -198,27 +201,21 @@ def main():
                 result = run_episode(client, task_type, scenario_id, seed=i)
                 all_results.append(result)
             except Exception as e:
-                print(f"[STEP] step=0 action=error reward=0.0 done=true error={e}")
-                print(f"[END] success=false steps=0 score=0.0 rewards=0.0")
-    # Summary
     if all_results:
         by_task: dict[str, list] = {"task1": [], "task2": [], "task3": []}
         for r in all_results:
-            by_task[r["task_type"]].append(r["total_score"])
-        print("\n=== Baseline Summary ===")
         overall_scores = []
         for task_type, scores in by_task.items():
             if scores:
-                avg = sum(scores) / len(scores)
                 overall_scores.extend(scores)
-                print(f"  {task_type}: episodes={len(scores)} avg_score={avg:.4f}")
         overall_avg = sum(overall_scores) / len(overall_scores) if overall_scores else 0.0
-        print(f"  OVERALL: episodes={len(overall_scores)} avg_score={overall_avg:.4f}")
-        # Save results
         out_path = Path("outputs/baseline_scores.json")
         out_path.parent.mkdir(exist_ok=True)
         with open(out_path, "w") as f:
@@ -232,8 +229,7 @@ def main():
                 },
                 "episodes": all_results,
             }, f, indent=2)
-        print(f"\nResults saved to {out_path}")
 if __name__ == "__main__":
-    main()

 import json
 import os
 from pathlib import Path
 # ---------------------------------------------------------------------------
             self._env = SentinelEnv()
         self.session_id: str = ""
+    def reset(self, task_type: str, scenario_id: str | None = None, seed: int | None = None) -> dict:
+        payload = {"task_type": task_type, "scenario_id": scenario_id, "seed": seed}
         if USE_REMOTE:
+            r = self._client.post("/reset", json=payload)
             r.raise_for_status()
             result = r.json()
         else:
+            result = self._env.reset(task_type=task_type, scenario_id=scenario_id, seed=seed)
         self.session_id = result["info"]["session_id"]
         return result
     scenario_id: str,
     seed: int,
 ) -> dict:
+    result     = client.reset(task_type=task_type, scenario_id=scenario_id, seed=seed)
     session_id = client.session_id
     print(f"[START] task={scenario_id} env=sentinel-env model=heuristic-baseline")
     step_num    = 0
+    rewards: list[float] = []
+    final_score = 0.0
     while True:
         obs    = result["observation"]
         reward    = result["reward"]["value"]
         done      = result["done"]
         step_num += 1
+        rewards.append(reward)
+        final_score = result["info"].get("score", 0.0)
         action_str = f"{action['action_type']}:{action.get('specialist_id','SELF')}"
         print(
             f"[STEP] step={step_num} "
             f"action={action_str} "
+            f"reward={reward:.2f} "
             f"done={str(done).lower()} "
             f"error=null"
         )
     detections  = info.get("adversarial_detections", 0)
     poisonings  = info.get("adversarial_poisonings", 0)
     trust_snap  = info.get("trust_snapshot", {})
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
     print(
         f"[END] success=true "
         f"steps={step_num} "
+        f"score={final_score:.3f} "
+        f"rewards={rewards_str}"
     )
     return {
         "scenario_id":            scenario_id,
         "task_type":              task_type,
         "steps":                  step_num,
+        "score":                  round(final_score, 4),
+        "total_reward":           round(info.get("total_reward", 0.0), 4),
         "completion_rate":        round(completion, 4),
         "adversarial_detections": detections,
         "adversarial_poisonings": poisonings,
                 result = run_episode(client, task_type, scenario_id, seed=i)
                 all_results.append(result)
             except Exception as e:
+                print(f"[STEP] step=0 action=error reward=0.00 done=true error={e}")
+                print(f"[END] success=false steps=0 score=0.000 rewards=0.00")
     if all_results:
         by_task: dict[str, list] = {"task1": [], "task2": [], "task3": []}
         for r in all_results:
+            by_task[r["task_type"]].append(r["score"])
         overall_scores = []
         for task_type, scores in by_task.items():
             if scores:
                 overall_scores.extend(scores)
         overall_avg = sum(overall_scores) / len(overall_scores) if overall_scores else 0.0
         out_path = Path("outputs/baseline_scores.json")
         out_path.parent.mkdir(exist_ok=True)
         with open(out_path, "w") as f:
                 },
                 "episodes": all_results,
             }, f, indent=2)
 if __name__ == "__main__":
+    main()

openenv.yaml CHANGED Viewed

@@ -140,9 +140,9 @@ baseline:
   script: inference.py
   required_env_vars: [API_BASE_URL, MODEL_NAME, HF_TOKEN]
   optional_env_vars: [ENV_URL]
-  latest_local_score: 0.0
   latest_local_episodes: 30
   reproducibility:
     inference_temperature: 0.0
     agent: heuristic-trust-weighted
-    dataset_order: fixed SCN-TASK*-001 through SCN-TASK*-010 per task

   script: inference.py
   required_env_vars: [API_BASE_URL, MODEL_NAME, HF_TOKEN]
   optional_env_vars: [ENV_URL]
+  latest_local_score: 0.7942
   latest_local_episodes: 30
   reproducibility:
     inference_temperature: 0.0
     agent: heuristic-trust-weighted
+    dataset_order: fixed SCN-TASK*-001 through SCN-TASK*-010 per task

pyproject.toml CHANGED Viewed

	@@ -0,0 +1,30 @@

+[project]
+name = "sentinel-env"
+version = "1.0.0"
+description = "OpenEnv-compatible multi-agent trust calibration environment."
+requires-python = ">=3.10,<3.14"
+dependencies = [
+  "fastapi>=0.115,<1",
+  "uvicorn[standard]>=0.35,<1",
+  "pydantic>=2.7,<3",
+  "httpx>=0.28.1,<1",
+  "python-multipart==0.0.9",
+  "openenv-core>=0.2.0",
+]
+[project.scripts]
+server = "server.app:main"
+[project.optional-dependencies]
+dev = ["pytest>=8.0.0"]
+training = [
+  "trl",
+  "transformers",
+  "datasets",
+  "accelerate",
+  "unsloth",
+]
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+python_files = ["test_*.py"]

requirements.txt CHANGED Viewed

@@ -1,5 +1,5 @@
-fastapi==0.115.0
-uvicorn[standard]==0.30.6
-pydantic==2.7.4
-httpx==0.27.0
-python-multipart==0.0.9

+fastapi>=0.115,<1
+uvicorn[standard]>=0.35,<1
+pydantic>=2.7,<3
+httpx>=0.28.1,<1
+python-multipart==0.0.9

scenarios.py CHANGED Viewed

@@ -205,9 +205,10 @@ def _generate_scenarios(
     layout: list[tuple],
     count: int = 40,
 ) -> list[Scenario]:
     scenarios = []
     for i in range(count):
-        jittered = _jitter_stakes(layout, seed=i * 100 + hash(task_type) % 1000)
         sid = f"SCN-{task_type.upper()}-{i+1:03d}"
         scenarios.append(
             _build_scenario(sid, task_type, jittered, f"#{i+1:03d}")
@@ -263,4 +264,4 @@ def scenario_summary() -> dict:
             "task2": len(_TASK2_LAYOUT),
             "task3": len(_TASK3_LAYOUT),
         },
-    }

     layout: list[tuple],
     count: int = 40,
 ) -> list[Scenario]:
+    stable_offsets = {"task1": 101, "task2": 202, "task3": 303}
     scenarios = []
     for i in range(count):
+        jittered = _jitter_stakes(layout, seed=i * 100 + stable_offsets[task_type])
         sid = f"SCN-{task_type.upper()}-{i+1:03d}"
         scenarios.append(
             _build_scenario(sid, task_type, jittered, f"#{i+1:03d}")
             "task2": len(_TASK2_LAYOUT),
             "task3": len(_TASK3_LAYOUT),
         },
+    }

server/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """OpenEnv server package wrapper."""

server/app.py ADDED Viewed

	@@ -0,0 +1,16 @@

+from __future__ import annotations
+import os
+from app import app
+def main() -> None:
+    import uvicorn
+    port = int(os.environ.get("PORT", 7860))
+    uvicorn.run("server.app:app", host="0.0.0.0", port=port, reload=False)
+if __name__ == "__main__":
+    main()

specialists.py CHANGED Viewed

@@ -2,7 +2,6 @@ from __future__ import annotations
 import random
 from dataclasses import dataclass, field
-from typing import Protocol
 # ---------------------------------------------------------------------------
@@ -105,9 +104,10 @@ class DomainBoundSpecialist(BaseSpecialist):
         pass
     def execute(self, subtask: str, stakes: float, rng: random.Random) -> SpecialistResult:
-        # Domain is inferred from subtask text (keywords signal domain)
         in_domain = any(kw in subtask.lower() for kw in
-                        ["classify", "categorize", "label", "triage", "assess"])
         accuracy = self.IN_DOMAIN_ACCURACY if in_domain else self.OUT_DOMAIN_ACCURACY
         correct  = rng.random() < accuracy
         return SpecialistResult(
@@ -272,4 +272,18 @@ class SpecialistPool:
         return result
     def available_ids(self) -> list[str]:
-        return list(self._profile.keys())

 import random
 from dataclasses import dataclass, field
 # ---------------------------------------------------------------------------
         pass
     def execute(self, subtask: str, stakes: float, rng: random.Random) -> SpecialistResult:
+        # SENTINEL's abstract scenarios expose domain through task wording.
+        # This specialist is strong at analysis/verification and weak elsewhere.
         in_domain = any(kw in subtask.lower() for kw in
+                        ["analyze", "analysis", "identify", "pattern", "verify", "correctness", "assess"])
         accuracy = self.IN_DOMAIN_ACCURACY if in_domain else self.OUT_DOMAIN_ACCURACY
         correct  = rng.random() < accuracy
         return SpecialistResult(
         return result
     def available_ids(self) -> list[str]:
+        return list(self._profile.keys())
+    def internal_profile(self) -> dict[str, str]:
+        """Public specialist id -> hidden internal behavior id."""
+        return dict(self._profile)
+    def public_ground_truth_reliability(self, internal_reliability: dict[str, float]) -> dict[str, float]:
+        """
+        Map hidden internal behavior reliabilities onto public slots.
+        The reward engine uses this; the orchestrator never sees it.
+        """
+        return {
+            public_id: internal_reliability.get(internal_id, 0.5)
+            for public_id, internal_id in self._profile.items()
+        }

task_graph.py CHANGED Viewed

@@ -1,6 +1,6 @@
 from __future__ import annotations
-from dataclasses import dataclass, field
 from typing import Optional
 from scenarios import Scenario, SubTask
@@ -18,6 +18,8 @@ class TaskNode:
     specialist_used: str = ""
     attempts: int = 0
     was_adversarial: bool = False
 # ---------------------------------------------------------------------------
@@ -28,6 +30,8 @@ class TaskNode:
 # ---------------------------------------------------------------------------
 class TaskGraph:
     def __init__(self, scenario: Scenario) -> None:
         self._scenario   = scenario
         self._nodes: dict[str, TaskNode] = {}
@@ -50,6 +54,8 @@ class TaskGraph:
         """
         for sid in self._order:
             node = self._nodes[sid]
             if node.status == "pending" and self._deps_met(sid):
                 node.status = "ready"
             if node.status == "ready":
@@ -57,17 +63,24 @@ class TaskGraph:
         return None
     def _deps_met(self, subtask_id: str) -> bool:
-        """All dependencies of this node must be 'completed'."""
         deps = self._nodes[subtask_id].subtask["depends_on"]
         return all(
-            self._nodes[dep].status == "completed"
             for dep in deps
             if dep in self._nodes
         )
     def is_done(self) -> bool:
         return all(
-            n.status in ("completed", "failed", "skipped")
             for n in self._nodes.values()
         )
@@ -81,8 +94,7 @@ class TaskGraph:
         Avoided = node was adversarial AND orchestrator chose VERIFY or SOLVE_INDEPENDENTLY.
         """
         return sum(
-            1 for n in self._nodes.values()
-            if n.was_adversarial and n.status == "completed" and n.outcome > 0.0
         )
     def adversarial_poisonings(self) -> int:
@@ -90,14 +102,14 @@ class TaskGraph:
         Count of adversarial results that slipped through unchecked.
         """
         return sum(
-            1 for n in self._nodes.values()
-            if n.was_adversarial and n.outcome == 0.0
         )
     def subtasks_remaining(self) -> int:
         return sum(
             1 for n in self._nodes.values()
             if n.status in ("pending", "ready", "in_progress")
         )
     def subtasks_completed(self) -> int:
@@ -106,6 +118,12 @@ class TaskGraph:
     def subtasks_total(self) -> int:
         return len(self._nodes)
     def high_stakes_nodes(self) -> list[TaskNode]:
         return [n for n in self._nodes.values() if n.subtask["stakes"] >= 0.70]
@@ -126,7 +144,11 @@ class TaskGraph:
         node.outcome         = outcome
         node.specialist_used = specialist_id
         node.attempts        += 1
-        node.was_adversarial  = was_adversarial
         node.status = "completed" if outcome > 0.0 else "failed"
     def skip_node(self, subtask_id: str) -> None:
@@ -143,6 +165,7 @@ class TaskGraph:
             "task_type":            self._scenario["task_type"],
             "subtasks_total":       self.subtasks_total(),
             "subtasks_completed":   self.subtasks_completed(),
             "subtasks_remaining":   self.subtasks_remaining(),
             "completion_rate":      round(self.completion_rate(), 3),
             "adversarial_detections": self.adversarial_detections(),
@@ -151,4 +174,4 @@ class TaskGraph:
         }
     def node_statuses(self) -> dict[str, str]:
-        return {sid: n.status for sid, n in self._nodes.items()}

 from __future__ import annotations
+from dataclasses import dataclass
 from typing import Optional
 from scenarios import Scenario, SubTask
     specialist_used: str = ""
     attempts: int = 0
     was_adversarial: bool = False
+    adversarial_detection_count: int = 0
+    adversarial_poisoning_count: int = 0
 # ---------------------------------------------------------------------------
 # ---------------------------------------------------------------------------
 class TaskGraph:
+    MAX_ATTEMPTS_PER_NODE = 2
     def __init__(self, scenario: Scenario) -> None:
         self._scenario   = scenario
         self._nodes: dict[str, TaskNode] = {}
         """
         for sid in self._order:
             node = self._nodes[sid]
+            if node.status == "failed" and node.attempts < self.MAX_ATTEMPTS_PER_NODE:
+                node.status = "ready"
             if node.status == "pending" and self._deps_met(sid):
                 node.status = "ready"
             if node.status == "ready":
         return None
     def _deps_met(self, subtask_id: str) -> bool:
+        """All dependencies must be resolved before downstream work starts."""
         deps = self._nodes[subtask_id].subtask["depends_on"]
         return all(
+            self._is_dependency_resolved(dep)
             for dep in deps
             if dep in self._nodes
         )
+    def _is_dependency_resolved(self, subtask_id: str) -> bool:
+        node = self._nodes[subtask_id]
+        if node.status in ("completed", "skipped"):
+            return True
+        return node.status == "failed" and node.attempts >= self.MAX_ATTEMPTS_PER_NODE
     def is_done(self) -> bool:
         return all(
+            n.status in ("completed", "skipped")
+            or (n.status == "failed" and n.attempts >= self.MAX_ATTEMPTS_PER_NODE)
             for n in self._nodes.values()
         )
         Avoided = node was adversarial AND orchestrator chose VERIFY or SOLVE_INDEPENDENTLY.
         """
         return sum(
+            n.adversarial_detection_count for n in self._nodes.values()
         )
     def adversarial_poisonings(self) -> int:
         Count of adversarial results that slipped through unchecked.
         """
         return sum(
+            n.adversarial_poisoning_count for n in self._nodes.values()
         )
     def subtasks_remaining(self) -> int:
         return sum(
             1 for n in self._nodes.values()
             if n.status in ("pending", "ready", "in_progress")
+            or (n.status == "failed" and n.attempts < self.MAX_ATTEMPTS_PER_NODE)
         )
     def subtasks_completed(self) -> int:
     def subtasks_total(self) -> int:
         return len(self._nodes)
+    def subtasks_failed(self) -> int:
+        return sum(1 for n in self._nodes.values() if n.status == "failed")
+    def node_index(self, subtask_id: str) -> int:
+        return self._order.index(subtask_id)
     def high_stakes_nodes(self) -> list[TaskNode]:
         return [n for n in self._nodes.values() if n.subtask["stakes"] >= 0.70]
         node.outcome         = outcome
         node.specialist_used = specialist_id
         node.attempts        += 1
+        node.was_adversarial  = node.was_adversarial or was_adversarial
+        if was_adversarial and outcome > 0.0:
+            node.adversarial_detection_count += 1
+        elif was_adversarial:
+            node.adversarial_poisoning_count += 1
         node.status = "completed" if outcome > 0.0 else "failed"
     def skip_node(self, subtask_id: str) -> None:
             "task_type":            self._scenario["task_type"],
             "subtasks_total":       self.subtasks_total(),
             "subtasks_completed":   self.subtasks_completed(),
+            "subtasks_failed":      self.subtasks_failed(),
             "subtasks_remaining":   self.subtasks_remaining(),
             "completion_rate":      round(self.completion_rate(), 3),
             "adversarial_detections": self.adversarial_detections(),
         }
     def node_statuses(self) -> dict[str, str]:
+        return {sid: n.status for sid, n in self._nodes.items()}

tests/test_environment.py CHANGED Viewed

	@@ -0,0 +1,55 @@

+from __future__ import annotations
+import unittest
+from environment import SentinelEnv
+class EnvironmentTests(unittest.TestCase):
+    def test_reset_observation_has_integer_subtask_index(self) -> None:
+        env = SentinelEnv()
+        result = env.reset(task_type="task3", seed=3)
+        self.assertEqual(result["observation"]["subtask_index"], 0)
+        self.assertEqual(result["info"]["score"], 0.0)
+        self.assertFalse(result["done"])
+    def test_accurate_slow_public_slot_costs_two_steps(self) -> None:
+        env = SentinelEnv()
+        result = env.reset(task_type="task1", seed=11)
+        slow_slot = next(
+            public_id
+            for public_id, internal_id in env._pool.internal_profile().items()
+            if internal_id == "S0"
+        )
+        result = env.step({
+            "session_id": result["observation"]["session_id"],
+            "task_type": "task1",
+            "action_type": "delegate",
+            "specialist_id": slow_slot,
+        })
+        self.assertEqual(result["info"]["step_count"], 2)
+    def test_self_solve_finishes_long_task_with_normalized_score(self) -> None:
+        env = SentinelEnv()
+        result = env.reset(task_type="task3", seed=5)
+        while not result["done"]:
+            obs = result["observation"]
+            result = env.step({
+                "session_id": obs["session_id"],
+                "task_type": obs["task_type"],
+                "action_type": "solve_independently",
+                "subtask_response": "SELF_SOLVED",
+            })
+        self.assertEqual(result["info"]["completion_rate"], 1.0)
+        self.assertGreater(result["info"]["step_count"], 2)
+        self.assertGreaterEqual(result["info"]["score"], 0.0)
+        self.assertLessEqual(result["info"]["score"], 1.0)
+if __name__ == "__main__":
+    unittest.main()

tests/test_graders.py CHANGED Viewed

	@@ -0,0 +1,46 @@

+from __future__ import annotations
+import unittest
+from graders import clamp, grade_task3_step
+from scenarios import get_scenario
+from task_graph import TaskGraph
+class GraderAndGraphTests(unittest.TestCase):
+    def test_clamp_is_boundary_exclusive(self) -> None:
+        self.assertEqual(clamp(-10), 0.01)
+        self.assertEqual(clamp(10), 0.99)
+        self.assertEqual(clamp(0.42), 0.42)
+    def test_adversarial_verify_gets_detection_reward(self) -> None:
+        reward, reason, breakdown = grade_task3_step(
+            specialist_outcome=1.0,
+            stakes=0.85,
+            was_adversarial=True,
+            action_type="verify",
+            step_count=10,
+            max_steps=45,
+        )
+        self.assertGreater(reward, 0.8)
+        self.assertIn("Adversarial detected", reason)
+        self.assertEqual(breakdown["stakes_awareness"], 0.99)
+    def test_failed_nodes_are_retriable_then_resolved(self) -> None:
+        graph = TaskGraph(get_scenario("SCN-TASK1-001"))
+        node = graph.current_node()
+        self.assertIsNotNone(node)
+        graph.record_outcome(node.subtask["id"], 0.0, "S1")
+        retry = graph.current_node()
+        self.assertEqual(retry.subtask["id"], node.subtask["id"])
+        graph.record_outcome(node.subtask["id"], 0.0, "S1")
+        next_node = graph.current_node()
+        self.assertIsNotNone(next_node)
+        self.assertNotEqual(next_node.subtask["id"], node.subtask["id"])
+if __name__ == "__main__":
+    unittest.main()

tests/test_specialists.py CHANGED Viewed

	@@ -0,0 +1,35 @@

+from __future__ import annotations
+import random
+import unittest
+from specialists import DomainBoundSpecialist, SpecialistPool
+class SpecialistTests(unittest.TestCase):
+    def test_domain_bound_matches_abstract_analysis_and_verify_tasks(self) -> None:
+        specialist = DomainBoundSpecialist()
+        in_domain = specialist.execute("Analyze the inputs and identify the key pattern.", 0.2, random.Random(1))
+        out_domain = specialist.execute("Execute the planned action and report the outcome.", 0.2, random.Random(1))
+        self.assertTrue(in_domain.metadata["in_domain"])
+        self.assertFalse(out_domain.metadata["in_domain"])
+        self.assertEqual(in_domain.outcome, 1.0)
+        self.assertEqual(out_domain.outcome, 0.0)
+    def test_profile_shuffle_keeps_public_reliability_aligned(self) -> None:
+        pool = SpecialistPool()
+        pool.reset(seed=7)
+        profile = pool.internal_profile()
+        reliability = pool.public_ground_truth_reliability({"S0": 0.9, "S1": 0.6, "S2": 0.7, "S3": 0.15, "S4": 0.65})
+        self.assertEqual(set(profile), {"S0", "S1", "S2", "S3", "S4"})
+        self.assertEqual(set(reliability), {"S0", "S1", "S2", "S3", "S4"})
+        self.assertEqual(profile[pool.adversarial_slot], "S3")
+        self.assertEqual(reliability[pool.adversarial_slot], 0.15)
+if __name__ == "__main__":
+    unittest.main()

training/colab_notebook.ipynb CHANGED Viewed

	@@ -0,0 +1,54 @@

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# SENTINEL Training Notebook\n",
+        "\n",
+        "This notebook is the hackathon-facing skeleton for running SENTINEL rollouts and wiring GRPO/Unsloth on the finale GPU machine."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!git clone https://github.com/ADITYAGABA1322/sentinel-env || true\n",
+        "%cd sentinel-env\n",
+        "!pip install -r requirements.txt\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!python inference.py\n",
+        "!python training/evaluate.py --episodes 20 --task task3\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "For onsite GRPO, install `trl`, `unsloth`, `transformers`, `datasets`, and `accelerate`, then connect `training/train.py` to the provided model checkpoint."
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.11"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}

training/evaluate.py CHANGED Viewed

	@@ -0,0 +1,153 @@

+from __future__ import annotations
+import argparse
+import json
+import random
+import sys
+from pathlib import Path
+from typing import Callable
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+from environment import SentinelEnv, _GROUND_TRUTH_RELIABILITY
+Policy = Callable[[SentinelEnv, dict, random.Random], dict]
+def random_policy(env: SentinelEnv, obs: dict, rng: random.Random) -> dict:
+    specialist = rng.choice(obs["available_specialists"])
+    roll = rng.random()
+    if roll < 0.65:
+        action_type = "delegate"
+    elif roll < 0.85:
+        action_type = "verify"
+    elif roll < 0.95:
+        action_type = "solve_independently"
+        specialist = None
+    else:
+        action_type = "skip"
+        specialist = None
+    return _action(obs, action_type, specialist)
+def heuristic_policy(env: SentinelEnv, obs: dict, rng: random.Random) -> dict:
+    trust = obs["trust_snapshot"]
+    specialist = max(obs["available_specialists"], key=lambda sid: trust.get(sid, 0.5))
+    action_type = "verify" if obs["stakes_level"] >= 0.70 and trust.get(specialist, 0.5) < 0.65 else "delegate"
+    return _action(obs, action_type, specialist)
+def oracle_lite_policy(env: SentinelEnv, obs: dict, rng: random.Random) -> dict:
+    reliability = env._pool.public_ground_truth_reliability(_GROUND_TRUTH_RELIABILITY)
+    if obs["task_type"] == "task3" and obs["stakes_level"] >= 0.70:
+        return _action(obs, "verify", env._pool.adversarial_slot)
+    specialist = max(obs["available_specialists"], key=lambda sid: reliability.get(sid, 0.5))
+    return _action(obs, "delegate", specialist)
+def _action(obs: dict, action_type: str, specialist_id: str | None) -> dict:
+    return {
+        "session_id": obs["session_id"],
+        "task_type": obs["task_type"],
+        "action_type": action_type,
+        "specialist_id": specialist_id,
+        "subtask_response": "SELF_SOLVED" if action_type == "solve_independently" else None,
+        "reasoning": f"{action_type} via {specialist_id or 'SELF'}",
+    }
+def run_episode(policy_name: str, policy: Policy, task_type: str, seed: int) -> dict:
+    rng = random.Random(seed)
+    env = SentinelEnv()
+    result = env.reset(task_type=task_type, seed=seed)
+    rewards: list[float] = []
+    while not result["done"]:
+        action = policy(env, result["observation"], rng)
+        result = env.step(action)
+        rewards.append(result["reward"]["value"])
+    info = result["info"]
+    breakdown = result["reward"]["signal_breakdown"]
+    detections = info.get("adversarial_detections", 0)
+    poisonings = info.get("adversarial_poisonings", 0)
+    total_adversarial = detections + poisonings
+    detection_rate = detections / total_adversarial if total_adversarial else breakdown.get("detection_rate", 1.0)
+    return {
+        "policy": policy_name,
+        "task_type": task_type,
+        "seed": seed,
+        "steps": info.get("step_count", 0),
+        "score": round(info.get("score", 0.0), 4),
+        "total_reward": round(info.get("total_reward", 0.0), 4),
+        "completion_rate": round(info.get("completion_rate", 0.0), 4),
+        "detection_rate": round(detection_rate, 4),
+        "trust_calibration": round(breakdown.get("trust_calibration", 0.0), 4),
+        "adversarial_detections": detections,
+        "adversarial_poisonings": poisonings,
+        "status": "failed" if info.get("forced_end") else "completed",
+        "rewards": [round(value, 4) for value in rewards],
+    }
+def summarize(rows: list[dict]) -> dict:
+    grouped: dict[str, list[dict]] = {}
+    for row in rows:
+        grouped.setdefault(row["policy"], []).append(row)
+    summary = {}
+    for policy_name, policy_rows in grouped.items():
+        n = len(policy_rows)
+        summary[policy_name] = {
+            "episodes": n,
+            "avg_score": _avg(policy_rows, "score"),
+            "avg_completion_rate": _avg(policy_rows, "completion_rate"),
+            "avg_detection_rate": _avg(policy_rows, "detection_rate"),
+            "avg_trust_calibration": _avg(policy_rows, "trust_calibration"),
+            "avg_steps": _avg(policy_rows, "steps"),
+        }
+    return summary
+def _avg(rows: list[dict], key: str) -> float:
+    return round(sum(float(row.get(key, 0.0)) for row in rows) / max(1, len(rows)), 4)
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Evaluate SENTINEL policies.")
+    parser.add_argument("--episodes", type=int, default=20, help="Episodes per policy.")
+    parser.add_argument("--task", default="task3", choices=["task1", "task2", "task3"])
+    parser.add_argument("--out", default="outputs/evaluation_results.json")
+    args = parser.parse_args()
+    policies: dict[str, Policy] = {
+        "random": random_policy,
+        "heuristic": heuristic_policy,
+        "oracle_lite": oracle_lite_policy,
+    }
+    rows = []
+    for policy_name, policy in policies.items():
+        for seed in range(args.episodes):
+            rows.append(run_episode(policy_name, policy, args.task, seed))
+    payload = {
+        "task": args.task,
+        "episodes_per_policy": args.episodes,
+        "summary": summarize(rows),
+        "episodes": rows,
+    }
+    out_path = ROOT / args.out
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    out_path.write_text(json.dumps(payload, indent=2) + "\n")
+    print(json.dumps(payload["summary"], indent=2))
+if __name__ == "__main__":
+    main()

training/train.py CHANGED Viewed

	@@ -0,0 +1,118 @@

+from __future__ import annotations
+"""
+Minimal onsite training entrypoint.
+This file is intentionally import-light so it can run locally without GPU
+packages. On the finale machine, install the training extras from pyproject and
+use this script as the GRPO wiring point.
+"""
+import argparse
+import json
+import random
+import re
+import sys
+from pathlib import Path
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+from environment import SentinelEnv
+ACTION_RE = re.compile(r"\{.*\}", re.DOTALL)
+def build_prompt(observation: dict) -> str:
+    return (
+        "You are the SENTINEL orchestrator. Choose one JSON action.\n"
+        f"Task: {observation['task_type']}\n"
+        f"Subtask: {observation['current_subtask']}\n"
+        f"Stakes: {observation['stakes_level']:.2f}\n"
+        f"Trust: {json.dumps(observation['trust_snapshot'], sort_keys=True)}\n"
+        "Valid action_type values: delegate, verify, solve_independently, skip.\n"
+        "Return JSON with action_type and optional specialist_id."
+    )
+def parse_action(text: str, observation: dict) -> dict:
+    match = ACTION_RE.search(text or "")
+    payload = {}
+    if match:
+        try:
+            payload = json.loads(match.group(0))
+        except json.JSONDecodeError:
+            payload = {}
+    action_type = payload.get("action_type", "delegate")
+    specialist_id = payload.get("specialist_id")
+    if action_type in ("delegate", "verify") and specialist_id not in observation["available_specialists"]:
+        specialist_id = max(
+            observation["available_specialists"],
+            key=lambda sid: observation["trust_snapshot"].get(sid, 0.5),
+        )
+    if action_type == "solve_independently":
+        specialist_id = None
+    return {
+        "session_id": observation["session_id"],
+        "task_type": observation["task_type"],
+        "action_type": action_type,
+        "specialist_id": specialist_id,
+        "subtask_response": "SELF_SOLVED" if action_type == "solve_independently" else None,
+        "reasoning": payload.get("reasoning", "parsed-training-action"),
+    }
+def dry_run_rollouts(episodes: int, seed: int) -> dict:
+    rng = random.Random(seed)
+    scores = []
+    for idx in range(episodes):
+        env = SentinelEnv()
+        result = env.reset(task_type="task3", seed=seed + idx)
+        while not result["done"]:
+            obs = result["observation"]
+            specialist = max(obs["available_specialists"], key=lambda sid: obs["trust_snapshot"].get(sid, 0.5))
+            action = {
+                "session_id": obs["session_id"],
+                "task_type": obs["task_type"],
+                "action_type": "verify" if obs["stakes_level"] >= 0.70 and rng.random() < 0.5 else "delegate",
+                "specialist_id": specialist,
+                "subtask_response": None,
+                "reasoning": "dry-run heuristic",
+            }
+            result = env.step(action)
+        scores.append(result["info"]["score"])
+    return {"episodes": episodes, "avg_score": round(sum(scores) / max(1, len(scores)), 4)}
+def main() -> None:
+    parser = argparse.ArgumentParser(description="SENTINEL GRPO training harness.")
+    parser.add_argument("--dry-run", action="store_true", help="Run local rollouts without GPU dependencies.")
+    parser.add_argument("--episodes", type=int, default=5)
+    parser.add_argument("--seed", type=int, default=0)
+    args = parser.parse_args()
+    if args.dry_run:
+        print(json.dumps(dry_run_rollouts(args.episodes, args.seed), indent=2))
+        return
+    try:
+        import trl  # noqa: F401
+        import unsloth  # noqa: F401
+    except ImportError as exc:
+        raise SystemExit(
+            "Training dependencies are not installed. Run with --dry-run locally, "
+            "or install the pyproject training extras on the finale GPU machine."
+        ) from exc
+    raise SystemExit(
+        "GPU training hook is ready. Wire GRPOTrainer here using build_prompt(), "
+        "parse_action(), and SentinelEnv.step() as the reward source."
+    )
+if __name__ == "__main__":
+    main()

uv.lock ADDED Viewed

	@@ -0,0 +1,3 @@

+version = 1
+revision = 1
+requires-python = ">=3.10,<3.14"