YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
GR00T N1.5 Bimanual SO-101 LoRA Adapter
This repository contains a LoRA adapter for NVIDIA's GR00T N1.5 model, fine-tuned for bimanual SO-101 robot arms performing pick-and-place tasks.
Model Description
- Base Model: nvidia/GR00T-N1.5-3B
- Adapter Size: 13 MB (LoRA rank 16)
- Task: Bimanual red block pick-and-place
- Training Data: 100 teleoperation episodes (~84K frames)
- Action Space: 12D (2 arms × (5 joints + 1 gripper))
- Camera Setup: 3 RGB cameras (left_gripper, right_gripper, top)
Training Details
Configuration:
- Action Horizon: 64 steps
- Training Steps: 20,000
- Final Loss: ~0.04
- LoRA Rank: 16, Alpha: 32
- Frozen: Vision encoder, LLM, Diffusion model
- Trained: Projector + LoRA adapters
Hardware:
- GPU: NVIDIA RTX 5090 (32GB)
- Training Time: 4.5 hours
- Framework: Isaac-GR00T + PyTorch 2.7.0
Installation
# Clone Isaac-GR00T repository
git clone https://github.com/NVIDIA-Omniverse/Isaac-GR00T
cd Isaac-GR00T
# Install dependencies
conda create -n groot python=3.10
conda activate groot
pip install -e .[base]
pip install flash-attn==2.8.2 # Required for GR00T
# Download this adapter
huggingface-cli download Hrishnugg/groot-recode-bimanual-v2-lora \
--local-dir ./adapters/recode-bimanual-v2
Usage
Option 1: Inference with Isaac-GR00T
from gr00t.model.policy import Gr00tPolicy
import numpy as np
# Load base model + LoRA adapter
policy = Gr00tPolicy.from_checkpoint(
checkpoint_path="./adapters/recode-bimanual-v2", # Your LoRA adapter
embodiment_tag="new_embodiment",
data_config="recode_data_config:RecodeBimanualDataConfig"
)
# Prepare observations
observations = {
"video.left_gripper": left_camera_image, # Shape: (1, 480, 640, 3)
"video.right_gripper": right_camera_image, # Shape: (1, 480, 640, 3)
"video.top": top_camera_image, # Shape: (1, 480, 640, 3)
"state.left_arm": left_arm_joint_positions, # Shape: (1, 5)
"state.left_gripper": left_gripper_position, # Shape: (1, 1)
"state.right_arm": right_arm_joint_positions, # Shape: (1, 5)
"state.right_gripper": right_gripper_position, # Shape: (1, 1)
"annotation.human.task_description": ["Grab the red cube and put it in a red basket"]
}
# Get action prediction (returns 64-step horizon, use first step)
actions = policy.get_action(observations)
action_t0 = actions["action"][0] # Shape: (12,) - first timestep
# Extract per-arm commands
left_arm_cmd = action_t0[0:5] # 5 joint angles
left_gripper_cmd = action_t0[5] # Gripper position
right_arm_cmd = action_t0[6:11] # 5 joint angles
right_gripper_cmd = action_t0[11] # Gripper position
# Send to robot
robot.set_left_arm_position(left_arm_cmd)
robot.set_left_gripper(left_gripper_cmd)
robot.set_right_arm_position(right_arm_cmd)
robot.set_right_gripper(right_gripper_cmd)
Option 2: Using Inference Server
Start server:
python scripts/inference_service.py \
--server \
--model_path ./adapters/recode-bimanual-v2 \
--embodiment_tag new_embodiment \
--data_config recode_data_config:RecodeBimanualDataConfig \
--denoising_steps 4 \
--port 5555
Connect client:
from gr00t.eval.service import ExternalRobotInferenceClient
client = ExternalRobotInferenceClient(host="localhost", port=5555)
actions = client.get_action(observations)
Data Configuration Required
This adapter expects specific data configuration matching the training setup. Create recode_data_config.py:
from gr00t.experiment.data_config import BaseDataConfig
from gr00t.data.transform.base import ComposedModalityTransform
# ... (full config from training)
Or download from this repository.
Important Notes
Action Smoothing Recommended
Due to the diffusion model's 4 denoising steps (for real-time performance), predictions may have high-frequency noise. We strongly recommend temporal smoothing during deployment:
# Exponential moving average
alpha = 0.3 # 70% smoothing
smoothed_action = alpha * new_action + (1 - alpha) * previous_action
See eval_bimanual_lerobot.py in this repository for full implementation.
Camera Setup
Cameras must match training configuration:
left_gripper: Wrist camera on left arm (640x480 @ 30fps)right_gripper: Wrist camera on right arm (640x480 @ 30fps)top: Overhead camera (640x480 @ 30fps)
Action Space
12D continuous:
- Dimensions 0-4: Left arm joints (degrees)
- Dimension 5: Left gripper (0-47 range)
- Dimensions 6-10: Right arm joints (degrees)
- Dimension 11: Right gripper (0-47 range)
Performance
Open-loop Evaluation (on training data):
- MSE: ~10-12 (varies by episode)
- Action horizon: 64 steps
- Denoising steps: 4
Deployment:
- Inference speed: ~200ms per prediction (4 denoising steps)
- Control frequency: 30Hz recommended
- Temporal smoothing: Required for smooth execution
Limitations
- Trained on only 100 episodes (limited generalization)
- Single task: "Grab red cube, place in red basket"
- May show jittering without temporal smoothing
- Requires specific camera angles matching training setup
Citation
Built using NVIDIA Isaac GR00T:
@software{isaac_groot_2025,
title = {NVIDIA Isaac GR00T},
author = {NVIDIA Corporation},
year = {2025},
url = {https://github.com/NVIDIA-Omniverse/Isaac-GR00T}
}
License
Apache 2.0 (following GR00T base model)
Contact
For issues or questions, please contact the repository owner.