Robotics
ONNX
Safetensors
Gr00tN1d6
vla
quantization
int8
edge-ai

GR00T-N1.6-bridge-INT8-Edge

This model is an optimized and quantized version of GR00T-N1.6-bridge, designed for real-time robotic deployment on edge platforms such as Jetson AGX Orin.

It is based on GR00T-N1.6-3B, fine-tuned on the Bridge dataset, and further optimized using a combination of:

  • Post-Training Quantization (PTQ) to INT8
  • Quantization-Aware Training (QAT) for accuracy recovery
  • TensorRT deployment for hardware acceleration

This follows the standard production pipeline where model compression and deployment optimization must be co-designed to achieve real performance gains.


VRFAI Edge AI Optimization

This model is developed by the VRFAI Edge AI Optimization team, focusing on:

  • Deploying large-scale Vision-Language-Action (VLA) models on resource-constrained hardware
  • Achieving real-time robotic control under strict latency and power constraints
  • Bridging the gap between research models and production robotics systems

Core responsibilities of the team include:

  • Model compression (quantization, pruning, distillation)
  • Hardware-aware optimization (TensorRT, ONNX, CUDA kernels)
  • End-to-end pipeline optimization (vision → language → action)
  • Real-world validation on simulation and physical robots

The goal is to enable fully on-device, low-latency, and stable robotic intelligence systems.


Task Performance (Bridge Benchmark)

Task Eager (FP16) INT8 (PTQ+QAT) Δ (INT8 - Eager)
spoon_on_towel 59.83% 65.67 ± 3.01% +5.84%
carrot_on_plate 60.67% 59.00 ± 3.12% -1.67%
eggplant_in_basket 90.83% 90.50 ± 1.00% -0.33%
stack_cube 6.50% 4.00 ± 0.87% -2.50%
eggplant_in_sink 40.50% 41.33 ± 3.06% +0.83%
close_drawer 72.33% 70.83 ± 3.25% -1.50%
open_drawer 96.83% 96.17 ± 0.76% -0.66%
Overall Average 61.07% 61.07 ± 1.07% ~0.00%

Jetson AGX Orin Benchmark

Backend FPS Latency p50 (ms) p90 p99 Power (W) GPU Util (%)
PyTorch Eager 3.3 306.5 310.1 314.8 28.0 37.9%
torch.compile 5.7 176.6 178.0 179.9 32.7 48.5%
TensorRT BF16 8.7 115.6 117.1 117.8 42.0 79.6%
TensorRT INT8 (Full Pipeline) 12.3 81.3 82.9 83.6 38.2 70.1%

Optimization Strategy

Quantization (PTQ + QAT)

  • Applied INT8 PTQ using calibration data from robot trajectories
  • Followed by QAT to recover performance degradation
  • Quantized full pipeline:
    • Vision encoder (ViT)
    • Language backbone (LLM)
    • Action head (DiT)

QAT is critical in robotics scenarios where small numerical errors can significantly affect control behavior.


TensorRT Deployment

  • Exported model to ONNX → TensorRT engine
  • Enabled:
    • Kernel fusion
    • INT8 Tensor Core acceleration
    • Reduced memory bandwidth usage

Real-world speedup depends on matching quantization format with hardware kernel support, not just reducing precision.


Deployment

.venv/bin/python gr00t/eval/run_gr00t_server.py \
    --model-path <your-org>/GR00T-N1.6-bridge-INT8 \
    --embodiment_tag OXE_WIDOWX \
    --use_sim_policy_wrapper

Citation

@misc{nvidia2025gr00tn1openfoundation,
  title        = {GR00T N1: An Open Foundation Model for Generalist Humanoid Robots},
  author       = {{NVIDIA et al.}},
  year         = {2025},
  eprint       = {2503.14734},
  archivePrefix= {arXiv},
  primaryClass = {cs.RO}
}
Downloads last month
2
Safetensors
Model size
3B params
Tensor type
BF16
·
Video Preview
loading

Model tree for vrfai/GR00T-N1.6-bridge-INT8-Edge

Quantized
(1)
this model

Dataset used to train vrfai/GR00T-N1.6-bridge-INT8-Edge

Collection including vrfai/GR00T-N1.6-bridge-INT8-Edge

Paper for vrfai/GR00T-N1.6-bridge-INT8-Edge