GR00T-N1.6 - Clutter (joint, 3-cam)

NVIDIA Isaac GR00T-N1.6-3B fine-tuned on the ManiGuard clutter base task (sim Franka Panda). Part of the ManiGuard VLA benchmark - GR00T vs pi0.5 on the same task families with identical data, cameras, and controller.

Model

Base: nvidia/GR00T-N1.6-3B - Cosmos-Reason VLM + flow-matching DiT action head
Embodiment: NEW_EMBODIMENT - Franka Panda, 8-D joint state/action (7 arm joints + 1 gripper)
Cameras (3): image_left, image_right, wrist (256x256)
Action: arm = state-relative chunks, gripper = absolute; 16-step horizon; NON_EEF (joint space)
Tuning: GR00T-N1.6 default - VLM (LLM + visual) frozen, train projector + diffusion action head (no LoRA)

Training

1x H100, bf16, global batch 64, 53100 steps (~2 epochs over 1,699,175 frames), cosine LR (peak 1e-4)
Data: IDEAS-Lab-Northwestern/sentinel-pnp-clutter-joint; videos decoded as H.264 for GR00T's torchcodec loader

Usage

Load with Gr00tPolicy from Isaac-GR00T (n1.6-release), --embodiment-tag NEW_EMBODIMENT. The included experiment_cfg/ carries the modality config + normalization stats.

WARNING - Convention (must match at eval): joint-space JointController (absolute joint targets, NON_EEF) + 3 cameras (image_left, image_right, wrist). A mismatched controller or camera set silently feeds an out-of-distribution input.

Downloads last month: 16

Safetensors

Model size

3B params

Tensor type

F32

BF16

Video Preview

Robotics

Model tree for IDEAS-Lab-Northwestern/gr00t-n16-base-clutter-joint-3cam

Base model

nvidia/GR00T-N1.6-3B

Finetuned

(25)

this model