GR00T-N1.6-bridge-INT8-Edge

This model is an optimized and quantized version of GR00T-N1.6-bridge, designed for real-time robotic deployment on edge platforms such as Jetson AGX Orin.

It is based on GR00T-N1.6-3B, fine-tuned on the Bridge dataset, and further optimized using a combination of:

Post-Training Quantization (PTQ) to INT8
Quantization-Aware Training (QAT) for accuracy recovery
TensorRT deployment for hardware acceleration

This follows the standard production pipeline where model compression and deployment optimization must be co-designed to achieve real performance gains.

VRFAI Edge AI Optimization

This model is developed by the VRFAI Edge AI Optimization team, focusing on:

Deploying large-scale Vision-Language-Action (VLA) models on resource-constrained hardware
Achieving real-time robotic control under strict latency and power constraints
Bridging the gap between research models and production robotics systems

Core responsibilities of the team include:

Model compression (quantization, pruning, distillation)
Hardware-aware optimization (TensorRT, ONNX, CUDA kernels)
End-to-end pipeline optimization (vision → language → action)
Real-world validation on simulation and physical robots

The goal is to enable fully on-device, low-latency, and stable robotic intelligence systems.

Task Performance (Bridge Benchmark)

Task	Eager (FP16)	INT8 (PTQ+QAT)	Δ (INT8 - Eager)
spoon_on_towel	59.83%	65.67 ± 3.01%	+5.84%
carrot_on_plate	60.67%	59.00 ± 3.12%	-1.67%
eggplant_in_basket	90.83%	90.50 ± 1.00%	-0.33%
stack_cube	6.50%	4.00 ± 0.87%	-2.50%
eggplant_in_sink	40.50%	41.33 ± 3.06%	+0.83%
close_drawer	72.33%	70.83 ± 3.25%	-1.50%
open_drawer	96.83%	96.17 ± 0.76%	-0.66%
Overall Average	61.07%	61.07 ± 1.07%	~0.00%

Jetson AGX Orin Benchmark

Backend	FPS	Latency p50 (ms)	p90	p99	Power (W)	GPU Util (%)
PyTorch Eager	3.3	306.5	310.1	314.8	28.0	37.9%
torch.compile	5.7	176.6	178.0	179.9	32.7	48.5%
TensorRT BF16	8.7	115.6	117.1	117.8	42.0	79.6%
TensorRT INT8 (Full Pipeline)	12.3	81.3	82.9	83.6	38.2	70.1%

Optimization Strategy

Quantization (PTQ + QAT)

Applied INT8 PTQ using calibration data from robot trajectories
Followed by QAT to recover performance degradation
Quantized full pipeline:
- Vision encoder (ViT)
- Language backbone (LLM)
- Action head (DiT)

QAT is critical in robotics scenarios where small numerical errors can significantly affect control behavior.

TensorRT Deployment

Exported model to ONNX → TensorRT engine
Enabled:
- Kernel fusion
- INT8 Tensor Core acceleration
- Reduced memory bandwidth usage

Real-world speedup depends on matching quantization format with hardware kernel support, not just reducing precision.

Deployment

.venv/bin/python gr00t/eval/run_gr00t_server.py \
    --model-path <your-org>/GR00T-N1.6-bridge-INT8 \
    --embodiment_tag OXE_WIDOWX \
    --use_sim_policy_wrapper

Citation

@misc{nvidia2025gr00tn1openfoundation,
  title        = {GR00T N1: An Open Foundation Model for Generalist Humanoid Robots},
  author       = {{NVIDIA et al.}},
  year         = {2025},
  eprint       = {2503.14734},
  archivePrefix= {arXiv},
  primaryClass = {cs.RO}
}