GR00T-N1.6-bridge-INT8-Edge
This model is an optimized and quantized version of GR00T-N1.6-bridge, designed for real-time robotic deployment on edge platforms such as Jetson AGX Orin.
It is based on GR00T-N1.6-3B, fine-tuned on the Bridge dataset, and further optimized using a combination of:
- Post-Training Quantization (PTQ) to INT8
- Quantization-Aware Training (QAT) for accuracy recovery
- TensorRT deployment for hardware acceleration
This follows the standard production pipeline where model compression and deployment optimization must be co-designed to achieve real performance gains.
VRFAI Edge AI Optimization
This model is developed by the VRFAI Edge AI Optimization team, focusing on:
- Deploying large-scale Vision-Language-Action (VLA) models on resource-constrained hardware
- Achieving real-time robotic control under strict latency and power constraints
- Bridging the gap between research models and production robotics systems
Core responsibilities of the team include:
- Model compression (quantization, pruning, distillation)
- Hardware-aware optimization (TensorRT, ONNX, CUDA kernels)
- End-to-end pipeline optimization (vision → language → action)
- Real-world validation on simulation and physical robots
The goal is to enable fully on-device, low-latency, and stable robotic intelligence systems.
Task Performance (Bridge Benchmark)
| Task | Eager (FP16) | INT8 (PTQ+QAT) | Δ (INT8 - Eager) |
|---|---|---|---|
| spoon_on_towel | 59.83% | 65.67 ± 3.01% | +5.84% |
| carrot_on_plate | 60.67% | 59.00 ± 3.12% | -1.67% |
| eggplant_in_basket | 90.83% | 90.50 ± 1.00% | -0.33% |
| stack_cube | 6.50% | 4.00 ± 0.87% | -2.50% |
| eggplant_in_sink | 40.50% | 41.33 ± 3.06% | +0.83% |
| close_drawer | 72.33% | 70.83 ± 3.25% | -1.50% |
| open_drawer | 96.83% | 96.17 ± 0.76% | -0.66% |
| Overall Average | 61.07% | 61.07 ± 1.07% | ~0.00% |
Jetson AGX Orin Benchmark
| Backend | FPS | Latency p50 (ms) | p90 | p99 | Power (W) | GPU Util (%) |
|---|---|---|---|---|---|---|
| PyTorch Eager | 3.3 | 306.5 | 310.1 | 314.8 | 28.0 | 37.9% |
| torch.compile | 5.7 | 176.6 | 178.0 | 179.9 | 32.7 | 48.5% |
| TensorRT BF16 | 8.7 | 115.6 | 117.1 | 117.8 | 42.0 | 79.6% |
| TensorRT INT8 (Full Pipeline) | 12.3 | 81.3 | 82.9 | 83.6 | 38.2 | 70.1% |
Optimization Strategy
Quantization (PTQ + QAT)
- Applied INT8 PTQ using calibration data from robot trajectories
- Followed by QAT to recover performance degradation
- Quantized full pipeline:
- Vision encoder (ViT)
- Language backbone (LLM)
- Action head (DiT)
QAT is critical in robotics scenarios where small numerical errors can significantly affect control behavior.
TensorRT Deployment
- Exported model to ONNX → TensorRT engine
- Enabled:
- Kernel fusion
- INT8 Tensor Core acceleration
- Reduced memory bandwidth usage
Real-world speedup depends on matching quantization format with hardware kernel support, not just reducing precision.
Deployment
.venv/bin/python gr00t/eval/run_gr00t_server.py \
--model-path <your-org>/GR00T-N1.6-bridge-INT8 \
--embodiment_tag OXE_WIDOWX \
--use_sim_policy_wrapper
Citation
@misc{nvidia2025gr00tn1openfoundation,
title = {GR00T N1: An Open Foundation Model for Generalist Humanoid Robots},
author = {{NVIDIA et al.}},
year = {2025},
eprint = {2503.14734},
archivePrefix= {arXiv},
primaryClass = {cs.RO}
}
- Downloads last month
- 2
Model tree for vrfai/GR00T-N1.6-bridge-INT8-Edge
Base model
nvidia/GR00T-N1.6-3B