GR00T-N1.6 - Clutter (joint, 3-cam)
NVIDIA Isaac GR00T-N1.6-3B fine-tuned on the ManiGuard clutter base task (sim Franka Panda). Part of the ManiGuard VLA benchmark - GR00T vs pi0.5 on the same task families with identical data, cameras, and controller.
Model
- Base: nvidia/GR00T-N1.6-3B - Cosmos-Reason VLM + flow-matching DiT action head
- Embodiment: NEW_EMBODIMENT - Franka Panda, 8-D joint state/action (7 arm joints + 1 gripper)
- Cameras (3): image_left, image_right, wrist (256x256)
- Action: arm = state-relative chunks, gripper = absolute; 16-step horizon; NON_EEF (joint space)
- Tuning: GR00T-N1.6 default - VLM (LLM + visual) frozen, train projector + diffusion action head (no LoRA)
Training
- 1x H100, bf16, global batch 64, 53100 steps (~2 epochs over 1,699,175 frames), cosine LR (peak 1e-4)
- Data: IDEAS-Lab-Northwestern/sentinel-pnp-clutter-joint; videos decoded as H.264 for GR00T's torchcodec loader
Usage
Load with Gr00tPolicy from Isaac-GR00T (n1.6-release), --embodiment-tag NEW_EMBODIMENT. The included experiment_cfg/ carries the modality config + normalization stats.
WARNING - Convention (must match at eval): joint-space JointController (absolute joint targets, NON_EEF) + 3 cameras (image_left, image_right, wrist). A mismatched controller or camera set silently feeds an out-of-distribution input.
- Downloads last month
- 16
Model tree for IDEAS-Lab-Northwestern/gr00t-n16-base-clutter-joint-3cam
Base model
nvidia/GR00T-N1.6-3B