Navigation Model Zoo

A collection of vision-based navigation policies exported to ONNX, each wrapped in a small, uniform Python inference API. Maintained by Honglin He @ UCLA-VAIL.

Every model takes a short history of RGB frames and predicts a local trajectory (and optionally a distance-to-goal / arrival signal); a built-in PD controller turns the trajectory into (v, Ο‰) velocity commands. All models share the same wrapper interface so they can be swapped and benchmarked without per-model glue code.

Models

Folder Model / paper Goal mode Context Input HΓ—W Waypoints Weights
GNM_GL_Official GNM Β· ICRA 2023 goal-free 6 64Γ—85 5 gnm_imagegoal.onnx (+.data) Β· 35 MB
Vint_GL_Official ViNT Β· CoRL 2023 goal-free 6 64Γ—85 5 vint_imagegoal.onnx (+.data) Β· 97 MB
NoMaD_GL_Official NoMaD Β· ICRA 2024 goal-free (diffusion) 4 96Γ—96 8 Γ—8 samples 3Γ— .onnx (+.data) Β· 111 MB
CityWalker_PG_Official CityWalker Β· CVPR 2025 point-goal 5 350Γ—630 5 citywalker.onnx Β· 806 MB
MBRA_PG_Official MBRA Β· RA-L 2025 point-goal 6 96Γ—96 8 mbra.onnx Β· 254 MB
S2E S2E Β· ICLR 2026 point-goal / goal-free 11 256Γ—256 10 s2e.onnx Β· 382 MB
MIMIC MIMIC Β· ICRA 2026 goal-free 16 288Γ—512 13 mimic.onnx Β· 318 MB

Suffix legend: PG = point-goal, GL = goal-less (goal-free). Models with a .onnx.data companion (GNM, ViNT, NoMaD) use ONNX external weights β€” keep each .onnx and its .onnx.data together.

Common interface

Each folder is a self-contained module exposing one navigator class. They all follow the same contract:

import numpy as np
from MBRA_PG_Official.inference import MBRAPGNavigator   # run from the repo root

nav = MBRAPGNavigator(device="cuda")          # use device="cpu" if you have no GPU

# obs: (B, nav.context_size, 3, H, W) float32 in [0, 1]
#      the wrapper resizes & normalizes to the model's spec internally
obs = np.random.rand(1, nav.context_size, 3, 96, 96).astype(np.float32)

# Point-goal models take goal_xy (standard frame: x=forward, y=left, meters);
# goal-free models omit it.
traj, scores = nav.inference_trajectory(obs, goal_xy=np.array([5.0, 0.2]))  # (B, M, W, 2) meters
vw, best     = nav.inference_vw(obs,        goal_xy=np.array([5.0, 0.2]))   # vw: (B, 2) = [v, Ο‰]

nav.reset()   # clears PD-controller velocity smoothing between episodes

Conventions shared by every model:

  • Coordinate frame β€” all user-facing inputs/outputs are standard frame: x = forward, y = left, in meters. Models with a different internal convention (e.g. CityWalker) convert transparently.
  • Observations β€” (B, context_size, 3, H, W), float32, pixel values in [0, 1]. The wrapper handles resize and any ImageNet normalization. (Exception: MIMIC expects frames already at 288Γ—512 and does not resize.)
  • inference_trajectory(obs[, goal_xy]) β†’ (trajectory, scores). trajectory is (B, M, W, 2) in meters, where M is the number of modes (1 for unimodal, 8 for NoMaD) and W the waypoint count; scores is (B, M).
  • inference_vw(obs[, goal_xy]) β†’ (vw, best_traj) where vw is a (B, 2) torch tensor of [linear_v, angular_w]. Tune limits with max_v / max_w at construction.
  • Goal-free models (Vint, GNM, NoMaD, MIMIC) ignore goal_xy β€” call inference_trajectory(obs).

Installation

pip install onnxruntime-gpu numpy torch torchvision pyyaml pillow
# CPU-only: use onnxruntime instead of onnxruntime-gpu
pip install opencv-python   # required by S2E (frame resizing)

Optional, lab-internal dependency: Vint, GNM, and NoMaD expose an extra inference_vw_pp() method that uses urbansim.custom.pp.PurePursuitController; it is imported lazily and only needed for that method. MIMIC imports urbansim at module load, so its inference.py will not import without the urbansim package on your path.

Model details

GNM_GL_Official β€” gnm_imagegoal.onnx (+ .onnx.data)

Paper: GNM: A General Navigation Model to Drive Any Robot (ICRA 2023) Β· arXiv:2210.03370 Β· code

Goal-free General Navigation Model β€” same NavDP image-goal I/O contract as ViNT (obs_img (B,18,64,85) + goal_img (B,3,64,85) β†’ dist_pred (B,1), action_pred (B,5,4)), with a lower top speed. Expects input downsampled to β‰ˆ 3 Hz.

Vint_GL_Official β€” vint_imagegoal.onnx (+ .onnx.data)

Paper: ViNT: A Foundation Model for Visual Navigation (CoRL 2023) Β· arXiv:2306.14846 Β· project

Goal-free ViNT (NavDP image-goal backbone run with a random goal image). ONNX I/O: obs_img (B,18,64,85) (6 ImageNet-normalized frames Γ— 3 ch) + goal_img (B,3,64,85) (random noise) β†’ dist_pred (B,1), action_pred (B,5,4). Cumulative xy is already baked in; the wrapper scales by the 0.8 m metric spacing. Expects input downsampled to β‰ˆ 3 Hz.

NoMaD_GL_Official β€” 3Γ— ONNX (diffusion, + .onnx.data)

Paper: NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration (ICRA 2024) Β· arXiv:2310.07896 Β· project

Goal-free diffusion policy. Runs a 10-step DDPM loop (squaredcos_cap_v2) over 3 components: nomad_vision_encoder.onnx (obs_img (B,12,96,96) + goal_img (B,3,96,96) + goal_mask (B) β†’ cond (B,256)), nomad_noise_pred.onnx (one denoising step), and nomad_dist_pred.onnx. Produces 8 trajectory samples β†’ trajectory (B,8,8,2) meters (decode: unnormalize β†’ cumsum β†’ Γ—0.267 m spacing). This is the only multi-modal model and the slowest (diffusion + multiple samples).

CityWalker_PG_Official β€” citywalker.onnx

Paper: CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos (CVPR 2025) Β· arXiv:2411.17820 Β· project

Point-goal urban walker. ONNX I/O: obs_images (B,5,3,350,630) + trajectory (B,6,2) past waypoints β†’ wp_pred (B,5,2), arrive_pred (B,1) (arrival probability). Images are ImageNet-normalized internally; the model's internal y=forward, x=right frame is converted to standard frame by the wrapper. Input rate β‰ˆ 5 Hz.

MBRA_PG_Official β€” mbra.onnx

Paper: Learning to Drive Anywhere with Model-Based Reannotation (RA-L 2025) Β· arXiv:2505.05592 Β· project

Point-goal policy. ONNX I/O: obs_images (B,6,3,96,96) ImageNet-normalized + goal_pose (B,4) = [x, y, sin(yaw), cos(yaw)] β†’ waypoints (B,8,4). Goal is given as goal_xy (meters) and converted internally; waypoints are un-normalized by a 0.8 m metric spacing. Input rate β‰ˆ 5 Hz.

S2E β€” s2e.onnx

Paper: From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning (ICLR 2026) Β· arXiv:2507.22028 Β· project

UCLA-VAIL navigation foundation model; this is the behavior-cloning, point-goal, web-pretrained variant (S2EBC-PG-Web100). ONNX I/O: obs_images (B,11,3,256,256) in [0,1] (no ImageNet norm) + goal (B,3) = [norm_dist, cos(ΞΈ), sin(ΞΈ)] β†’ wp_pred (B,10,3) [x,y,yaw], wp_pred_score (B,63) mode scores. Frames are resized to 256Γ—256 with OpenCV.

MIMIC β€” mimic.onnx

Paper: Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion (ICRA 2026) Β· arXiv:2603.22527 Β· project

UCLA-VAIL goal-free long-context sidewalk policy. ONNX I/O: input (1,16,3,288,512) in [0,1] β†’ output (1,15,3) [x,y,yaw] at non-uniform timestamps (0.2 s–5.0 s @ 5 Hz). Batch is processed one sample at a time; the wrapper keeps the first 13 waypoints (~4 s) and scales to meters. Requires urbansim (see Installation).

Downloading

Full repo (includes the LFS-tracked ONNX weights):

hf download UCLA-VAIL/Navigation-Model-Zoo-Public --local-dir ./Navigation-Model-Zoo-Public

One model β€” fetch just its folder, e.g. MBRA:

hf download UCLA-VAIL/Navigation-Model-Zoo-Public \
  --include "MBRA_PG_Official/*" --local-dir .

Then run from the repo root: from MBRA_PG_Official.inference import MBRAPGNavigator.

External weights: GNM, ViNT, and NoMaD ship *.onnx.data files β€” keep each .onnx and its .onnx.data together in the same folder so ONNX Runtime can resolve the weights.

Intended use & limitations

These are research artifacts for navigation research, reproduction, and benchmarking β€” not safety-validated for deployment on real robots without additional testing. Each policy's behavior is bounded by its training distribution (camera intrinsics, embodiment, frame rate, environment). Several wrappers rectify/resize inputs to a specific training camera; mismatched cameras may degrade performance.

License

Released under Apache 2.0. Individual models carry the licenses and terms of their original sources (ViNT, GNM, NoMaD, CityWalker, MBRA) β€” check upstream before commercial use.

Citation

If you use a model from this zoo, please cite its original paper.

GNM

@inproceedings{shah2023gnm,
  title={Gnm: A general navigation model to drive any robot},
  author={Shah, Dhruv and Sridhar, Ajay and Bhorkar, Arjun and Hirose, Noriaki and Levine, Sergey},
  booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={7226--7233},
  year={2023},
  organization={IEEE}
}

ViNT

@article{shah2023vint,
  title={ViNT: A foundation model for visual navigation},
  author={Shah, Dhruv and Sridhar, Ajay and Dashora, Nitish and Stachowicz, Kyle and Black, Kevin and Hirose, Noriaki and Levine, Sergey},
  journal={arXiv preprint arXiv:2306.14846},
  year={2023}
}

NoMaD

@inproceedings{sridhar2024nomad,
  title={Nomad: Goal masked diffusion policies for navigation and exploration},
  author={Sridhar, Ajay and Shah, Dhruv and Glossop, Catherine and Levine, Sergey},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={63--70},
  year={2024},
  organization={IEEE}
}

CityWalker

@inproceedings{liu2025citywalker,
  title={Citywalker: Learning embodied urban navigation from web-scale videos},
  author={Liu, Xinhao and Li, Jintong and Jiang, Yicheng and Sujay, Niranjan and Yang, Zhicheng and Zhang, Juexiao and Abanes, John and Zhang, Jing and Feng, Chen},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={6875--6885},
  year={2025}
}

MBRA

@article{hirose2025learning,
  title={Learning to drive anywhere with model-based reannotation},
  author={Hirose, Noriaki and Ignatova, Lydia and Stachowicz, Kyle and Glossop, Catherine and Levine, Sergey and Shah, Dhruv},
  journal={IEEE Robotics and Automation Letters},
  volume={11},
  number={2},
  pages={1242--1249},
  year={2025},
  publisher={IEEE}
}

S2E

@article{he2025seeing,
  title={From seeing to experiencing: Scaling navigation foundation models with reinforcement learning},
  author={He, Honglin and Ma, Yukai and Squicciarini, Brad  and Wu, Wayne and Zhou, Bolei},
  journal={arXiv preprint arXiv:2507.22028},
  year={2025}
}

MIMIC

@article{he2026learning,
  title={Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion},
  author={He, Honglin and Ma, Yukai and Squicciarini, Brad and Wu, Wayne and Zhou, Bolei},
  journal={arXiv preprint arXiv:2603.22527},
  year={2026}
}

Contact

Maintained by UCLA-VAIL. Open an issue/discussion on the repository page for questions or contributions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Papers for UCLA-VAIL/Navigation-Model-Zoo-Public