MoGe-2 ViT-B "normal" β Monocular Geometry + Surface Normals (ONNX)
Heliosoph mirror of Ruicheng/moge-2-vitb-normal-onnx β the ViT-Base variant of MoGe-2's joint geometry + surface-normal model. DINOv2 ViT-B backbone predicts a per-pixel 3D point map, camera intrinsics, and per-pixel surface normals in a single forward pass.
The "normal" suffix marks this as the joint variant β distinct from the base MoGe-2 ladder that predicts geometry only. Pairing geometry + normals from the same network removes the need for a separate normal-estimation pass (DSINE, omnidata) when feeding a Poisson surface reconstruction pipeline.
Recommended default of the three-variant ladder β best quality-per-byte for most GPU workloads. Reach for ViT-S only if you specifically need CPU/edge latency; reach for ViT-L only if you specifically need peak reconstruction quality.
ONNX file is unchanged from upstream β re-hosted for distribution stability (the upstream lives on the author's personal HF account) and to ship a proper LICENSE + README alongside the bytes.
Credit: Ruicheng Wang and collaborators β MoGe-2 (Microsoft Research, 2025). The author's personal repos at Ruicheng/moge-2-vits-normal-onnx, Ruicheng/moge-2-vitb-normal-onnx, and Ruicheng/moge-2-vitl-normal-onnx are the authoritative upstream β this is a byte-for-byte mirror of the ViT-B variant.
What this repo contains
model.onnx # ~419 MB β DINOv2 ViT-B backbone, geometry + normal heads, fp32
LICENSE # MIT
The ONNX file is self-contained (no external .onnx_data sidecar). The upstream repo ships only model.onnx + .gitattributes; this mirror adds the LICENSE + README.
Variant ladder
| Variant | Backbone | Size | Use when⦠|
|---|---|---|---|
| ViT-S | DINOv2 ViT-Small (~22M backbone params) | ~141 MB | CPU / edge / fast-iteration workflows |
| ViT-B (this) | DINOv2 ViT-Base (~86M) | ~419 MB | Recommended default β best quality-per-byte for GPU workloads |
| ViT-L | DINOv2 ViT-Large (~300M) | ~1.32 GB | Peak quality, GPU-comfortable, large enough to push consumer VRAM |
All three share the same I/O signature β switch by swapping the file.
Input / output
| Spec | |
|---|---|
| Input | RGB image, NCHW float32, normalized per DINOv2 convention |
| Outputs | Per-pixel 3D point map (camera-frame), camera intrinsics, per-pixel surface normals |
| Dynamic axes | Batch + spatial β inspect with Netron for exact names and ranges |
The exact input/output tensor names + supported spatial-dim multiples aren't documented at the upstream repo (model.onnx + .gitattributes only). Inspect the graph with Netron before integrating, or cross-reference the microsoft/MoGe PyTorch reference for the preprocessing convention.
When to pick MoGe-2 normal vs alternatives
| Need | Pick |
|---|---|
| Geometry + normals from one forward pass | MoGe-2 normal (this family) |
| Relative depth only, broadest hardware support | Depth Anything V2/V3 |
| Metric depth in meters, outdoor scenes | Metric3D V2 |
| Surface normals only, smallest model | DSINE |
| Per-pixel point map only (no normals) | MoGe v1 ViT-L |
MoGe-2 normal is the right pick when you're feeding a Poisson surface reconstruction (which wants both positions AND normals at every point), or when downstream rendering needs per-pixel shading normals "for free" alongside depth.
License
MIT β assumed from the sibling Ruicheng/moge-2-vitl-normal-onnx which ships an explicit LICENSE file, plus the upstream microsoft/MoGe code repo being MIT. The upstream ViT-B repo doesn't ship a LICENSE itself; this mirror adds a canonical MIT LICENSE with copyright attributed to Microsoft Research. If the upstream author confirms a different license later, this mirror will follow.
Model tree for Heliosoph/moge-2-vitb-normal-onnx
Base model
Ruicheng/moge-2-vitb-normal-onnx