vla-sae-libero — LIBERO smoke-test tooling & run artifacts for nvidia/GR00T-N1.7-LIBERO

This repo holds only the smoke-test tooling and the artifacts of one completed run from deploying nvidia/GR00T-N1.7-LIBERO (the libero_10 checkpoint) and running 10 short, simulation-only LIBERO rollouts to verify the deploy/eval plumbing — a stepping stone toward activation-capture / SAE work on the GR00T VLA.

Built on NVIDIA Isaac-GR00T — not redistributed here

The tooling in this repo runs on top of NVIDIA Isaac-GR00T (Apache-2.0, © NVIDIA CORPORATION & AFFILIATES). This repo does NOT contain the upstream Isaac-GR00T source tree, the model weights, or the gated VLM backbone. To use the tooling: clone Isaac-GR00T, install its gr00t package, drop examples/LIBERO/smoke_tests/ (and tests/test_libero_smoke_tests.py) into the checkout, download the checkpoint, and follow the steps below.

Specifically, not included (get them from upstream):

Contents

examples/LIBERO/smoke_tests/                 # the smoke-test tooling
├── scenarios_10.yaml                        #   10-scenario manifest (8 normal + 2 abnormal_probe)
├── run_10_smoke_tests.py                    #   runner: validates model/server, runs each scenario via the official eval path, writes summaries
├── _libero_rollout_worker.py                #   single-episode worker (runs in the LIBERO venv): records action chunks, optional sim-only probes
├── visualize_smoke_run.py                   #   builds visual_report.html + per-scenario action plots + visual_summary.{csv,json}
├── make_video_montage.py                    #   builds review_montage.mp4 + per-scenario captioned clips + review_playlist.html (needs ffmpeg)
├── review_smoke_tests.py                    #   CLI: list/play the rollouts of a run
├── ACTIVATION_HOOK_NOTES.md                 #   where to insert future SAE activation capture (module names, hook points)
└── README.md                                #   setup & run instructions

tests/test_libero_smoke_tests.py             # lightweight tests for the manifest + summary writer

outputs/libero_smoke_tests/20260512_122756/  # the completed run (10/10 rollouts OK, 0 errors; success=false everywhere — these are 50-step smoke rollouts, not a benchmark)
├── summary.json, summary.md                 #   run-level results + the printed comparison table
├── visual_report.html                       #   self-contained report: embedded videos + plots + comparison table
├── visual_summary.csv, visual_summary.json  #   per-scenario action-stat table
├── review_montage.mp4                        #   ~22 s: all 10 scenarios back-to-back with captions
├── review_clips/<scenario_id>.mp4           #   10 short captioned clips
├── review_playlist.html                     #   the montage + each clip + metadata + links
├── plots/<scenario_id>/                     #   action_norm / action_mean_per_dof / gripper_over_time / action_delta_norm  (.png)
└── <scenario_id>/                           #   ×10
    ├── video.mp4                            #     rendered rollout (agentview | wrist, 512×256)
    ├── frames/                              #     decoded PNG frames (25 each; 8 for the short-timeout probe)
    ├── actions.npy                          #     recorded action chunks, shape (n_calls, 1, 16, 7) float32   [DoF column order: gripper, pitch, roll, x, y, yaw, z]
    ├── metadata.json, rollout_summary.json  #     scenario config + results
    └── stdout.log, stderr.log               #     the worker subprocess logs

outputs/_setup_logs/                         # deployment record: uv install / LIBERO setup / server / smoke-run logs (no secrets)

LICENSE                                       # Apache-2.0 (same license as upstream Isaac-GR00T)
NOTICE                                        # attribution

The run, in one line

10 LIBERO simulation rollouts of nvidia/GR00T-N1.7-LIBERO (libero_10), max_episode_steps = 50 (16 for the short-timeout probe), n_action_steps = 8, n_envs = 1, no physical hardware. 8 normal long-horizon tasks + 2 abnormal_probe (mild Gaussian observation noise; deliberately shortened timeout). All 10 ran cleanly with actions + video saved; success = false everywhere — a 50-step rollout cannot complete these long-horizon tasks, and that's the point: this verifies the deploy/serve/client/sim/render/save plumbing, not task performance.

Reproduce

Full instructions: examples/LIBERO/smoke_tests/README.md. Summary:

# 0. start from a clone of NVIDIA Isaac-GR00T and copy this repo's files into it
git clone https://github.com/NVIDIA/Isaac-GR00T.git && cd Isaac-GR00T
git submodule update --init external_dependencies/LIBERO
# ... then place examples/LIBERO/smoke_tests/ and tests/test_libero_smoke_tests.py from THIS repo here ...

# install gr00t (uv sync, or the minimal recipe in examples/LIBERO/smoke_tests/README.md), then:
sudo apt install libegl1-mesa-dev libglu1-mesa cmake && bash gr00t/eval/sim/LIBERO/setup_libero.sh
uv run hf download nvidia/GR00T-N1.7-LIBERO --include "libero_10/*" --local-dir checkpoints/GR00T-N1.7-LIBERO
# request access to https://huggingface.co/nvidia/Cosmos-Reason2-2B  and  export HF_TOKEN=hf_...

# 1. server (terminal 1) — this pulls the gated nvidia/Cosmos-Reason2-2B backbone
uv run python gr00t/eval/run_gr00t_server.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --embodiment-tag LIBERO_PANDA --use-sim-policy-wrapper

# 2. smoke tests (terminal 2)
uv run python examples/LIBERO/smoke_tests/run_10_smoke_tests.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 --host 127.0.0.1 --port 5555 \
  --manifest examples/LIBERO/smoke_tests/scenarios_10.yaml --output-dir outputs/libero_smoke_tests \
  --max-episode-steps 50 --save-video --render

# 3. review artifacts (no GPU, no rerun)
python examples/LIBERO/smoke_tests/visualize_smoke_run.py  --run-dir outputs/libero_smoke_tests/<ts> --open
python examples/LIBERO/smoke_tests/make_video_montage.py   --run-dir outputs/libero_smoke_tests/<ts> --mode sequential --open

Next: activation capture / SAE

See examples/LIBERO/smoke_tests/ACTIVATION_HOOK_NOTES.md. Suggested first step: register a forward_hook on policy.model.backbone (the Qwen3Backbone — its forward returns backbone_features, the last Qwen3-VL hidden state = the fused image+instruction representation that conditions action generation), dump that per get_action call alongside the already-saved actions.npy + frames, and train an SAE on it.

License & attribution

This repo is released under Apache-2.0, matching upstream NVIDIA Isaac-GR00T (https://github.com/NVIDIA/Isaac-GR00T, © NVIDIA CORPORATION & AFFILIATES) — see LICENSE and NOTICE. The added tooling (examples/LIBERO/smoke_tests/, tests/test_libero_smoke_tests.py) is Apache-2.0 and imports/wraps the upstream gr00t package. The artifacts under outputs/ are outputs of the nvidia/GR00T-N1.7-LIBERO model run in LIBERO simulation; no model weights or upstream source files (other than this LICENSE) are redistributed here.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Space using sarel/vla-sae-libero 1