AEGIS: A Backup Reflex for Physical AI

AEGIS: the gate fires early and a stronger policy takes over

Interactive demo · Paper (arXiv:2606.06660) · Rollout logs

AEGIS (Activation-probe Early-warning, Gated Inference Switching) is a runtime escalation layer for robot manipulation policies. A cheap probe reads the deployed policy's frozen internal activations as a per-step early-warning signal; when a calibrated gate fires, control switches mid-trajectory to a stronger separate policy, but only for the steps that need it. Both policies stay frozen — the ten-kilobyte probe head in this repo is the only trained component, and it decides when a 4.14B policy wakes up.

The thesis is one sentence: a robot policy can read its own activations as an early-warning signal and call a stronger policy before failure compounds, recovering twice as many failures as matched-budget escalation.

What it is

Long-horizon failures are slow spirals: one bad step degrades the state, the next compounds it, and the episode is lost long before it ends. Detect-only methods (SAFE, FIPER, Sentinel) see this coming and raise an alarm but never act; recover-within-policy methods (HELM, Pre-VLA, FailSafe) act, but only by asking the same failing policy to try again. AEGIS does what a human supervisor would do: call in someone stronger, at the moment it matters.

AEGIS architecture: frozen weak policy, probe, gate, handoff to a frozen strong policy

The runtime loop: the weak policy (SmolVLA, 450M) drives by default and a forward hook reads its action-expert layer-15 activations live (720-d, mean-pooled over each 10-step action chunk). The probe head scores each step; a split-conformal threshold (α=0.10), an early-harm guard (no escalation before 0.20 T), and a per-episode budget cap (⌈0.05 T⌉ fires) turn the score into a switching decision. On a fire, control hands to π₀.₅ (4.14B) at the next chunk boundary, holds for ≥3 chunks, and returns on hysteresis. Both policies sit warm in one process (~9.5 GB VRAM for the backup).

Key results (measured, from the paper)

All from the confirmatory factorial: LIBERO-Spatial, full 10×70 task×seed common-random-number grid, ≥700 episodes per arm, 646 in the weak-policy-failing conditional pool.

  • Recovered-task rate 10.1% on the episodes the weak policy alone loses — vs 4.6% for budget-matched blind escalation and 5.1% for a random-trigger placebo at the same strong-policy budget and temporal spread (B−C +5.4pp, exact McNemar p=8.5×10⁻⁶; B−D +5.0pp, p=1.0×10⁻⁴; Holm-adjusted, one-sided; all whole-trajectory bootstrap CIs exclude zero).
  • Duty cycle 38%: the stronger policy is dormant most of the time; per-episode cost ≈44% of always-strong (parameter-count schematic). The always-strong ceiling recovers 31.9% at ≈4.6× the compute.
  • Selectivity cuts both ways: AEGIS recovers 65/646 failures while disrupting only 10/54 of the weak policy's successes (recover:disrupt 6.5, vs 1.8 blind, 3.3 random) — the intervention paradox is the failure mode the gate is built against.
  • Early-window AUROC 0.764 (95% cluster-bootstrap CI [0.70, 0.84], n=2,792 episodes) read over the first 30% of steps on the weak-policy path before any handoff — a precondition, not the headline, because accurate prediction does not imply effective prevention.
  • Sign-invariant under simulator non-determinism: across 2,000 replicate redraws of the 212 multi-host cells, no primary contrast ever reverses.
  • Cross-family generalization: swapping the escalation target to GR00T N1.7 recovers 15.5%, consistent with the effect being a property of escalating to a stronger separate policy, not of one lucky pair.

Headline: timing doubles recovery at matched compute

How to use

The released probe is plain numpy — no framework needed to score risk:

import numpy as np
from huggingface_hub import hf_hub_download

art = np.load(hf_hub_download("kaikaku/aegis", "probe_artifact.npz"))
mu, sd, w, b = art["mu"], art["sd"], art["w"], float(art["b"])
tau = float(art["conformal_threshold"])   # split-conformal, alpha = 0.10

def risk(h):  # h: (720,) mean-pooled layer-15 action-expert activations
    z = (h - mu) / sd
    return 1.0 / (1.0 + np.exp(-(z @ w + b)))

Hook the frozen weak policy (SmolVLA via LeRobot) and run the reflex:

import math

feats = []
layer = policy.model.vlm_with_expert.lm_expert.layers[15].self_attn.o_proj
hook = layer.register_forward_hook(
    lambda m, i, o: feats.append(o.detach().float().mean(dim=(0, 1)).cpu().numpy())
)
# sanity check: live activations vary step to step (std > 0.05).
# a frozen cached feature here is exactly the bug that gives AUROC 0.50.

T, H  = 520, 10                  # horizon, native action-chunk length
t_min = max(int(0.20 * T), 2)    # early-harm guard
k_max = math.ceil(0.05 * T)      # per-episode budget cap on gate fires
fires, driver = 0, weak_policy

for t in range(T):
    a_t = driver.select_action(obs)
    s_t = risk(feats[-1])        # the weak forward pass keeps running
    if driver is weak_policy and s_t >= tau and t >= t_min and fires < k_max:
        fires += 1
        driver = strong_policy   # switch at the next chunk boundary,
                                 # hold >= 3 chunks, return on hysteresis
    obs = env.step(a_t)

The full gate semantics (chunk-boundary handoff, hold, hysteretic de-escalation) are frozen in gate_config.json; the rollout, calibration, and analysis code that reproduces the paper's tables from the logged traces ships with the dataset repo.

Files in this repo

  • probe_artifact.npz — the frozen probe head: feature standardization (mu, sd), logistic weights (w, b), and the split-conformal trigger threshold calibrated at α=0.10. Reads SmolVLA action-expert layer-15 o_proj (720-d, mean-pooled over the 10-token chunk).
  • gate_config.json — the deployed gate configuration (early-harm guard, budget cap, hold, hysteresis) and the measured headline numbers, frozen before the confirmatory run.
  • banner.png, fig_architecture.png, fig_headline_bars.png — card art and paper figures.

The probe is policy-specific: it was trained on SmolVLA's layer-15 action-expert activations on LIBERO and will not transfer to a different backbone without refitting (the paper's OFT-7B supporting study refits a [4096→256→1] head).

Limitations

  • The gains over the matched controls are real but modest (+5.4pp / +5.0pp conditional RTR), and the B−C interval touches zero in the HARD difficulty tercile; the within-stratum claim rests on EASY and MEDIUM.
  • AEGIS does not out-recover bigger spends: HELM-style rollback (15.5%), GR00T escalation (15.5%) and always-strong (31.9%) all recover more at higher compute. The claim is selectivity at a fixed budget.
  • Simulation only (LIBERO-Spatial), one weak/strong headline pair; real-hardware transfer and broader pair coverage are named as the next tests, not assumed.
  • Conformal coverage is marginal rather than conditional where difficulty strata were too small to calibrate their own threshold; realized trigger rates are reported empirically.
  • Escalation overhead scales with the escalated fraction times the relative cost of the stronger policy; it does not transfer to a different pair without re-profiling.

Relation to AURA

AEGIS is the outward counterpart of AURA, the author's companion memory gate: AURA gates memory writes inward to save bandwidth at fixed success; AEGIS gates compute outward to raise success at fixed memory. The trigger semantics differ (should I write vs. will this trajectory fail and should I escalate), and the failure-trained probe dominates the closest surprise proxy (early AUROC 0.764 vs 0.63).

Citation

@misc{chen2026aegis,
  title         = {AEGIS: A Backup Reflex for Physical AI: Calling a Stronger
                   Policy Before Long-Horizon Failures Compound},
  author        = {Chen, Josef},
  year          = {2026},
  eprint        = {2606.06660},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url           = {https://arxiv.org/abs/2606.06660}
}

Not affiliated with Hugging Face, Physical Intelligence, or NVIDIA; model names are trademarks of their respective owners.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Paper for Kaikaku/aegis