MoGe-2 ViT-B "normal" — Monocular Geometry + Surface Normals (ONNX)

Heliosoph mirror of Ruicheng/moge-2-vitb-normal-onnx — the ViT-Base variant of MoGe-2's joint geometry + surface-normal model. DINOv2 ViT-B backbone predicts a per-pixel 3D point map, camera intrinsics, and per-pixel surface normals in a single forward pass.

The "normal" suffix marks this as the joint variant — distinct from the base MoGe-2 ladder that predicts geometry only. Pairing geometry + normals from the same network removes the need for a separate normal-estimation pass (DSINE, omnidata) when feeding a Poisson surface reconstruction pipeline.

Recommended default of the three-variant ladder — best quality-per-byte for most GPU workloads. Reach for ViT-S only if you specifically need CPU/edge latency; reach for ViT-L only if you specifically need peak reconstruction quality.

ONNX file is unchanged from upstream — re-hosted for distribution stability (the upstream lives on the author's personal HF account) and to ship a proper LICENSE + README alongside the bytes.

Credit: Ruicheng Wang and collaborators — MoGe-2 (Microsoft Research, 2025). The author's personal repos at Ruicheng/moge-2-vits-normal-onnx, Ruicheng/moge-2-vitb-normal-onnx, and Ruicheng/moge-2-vitl-normal-onnx are the authoritative upstream — this is a byte-for-byte mirror of the ViT-B variant.

What this repo contains

model.onnx        # ~419 MB — DINOv2 ViT-B backbone, geometry + normal heads, fp32
LICENSE           # MIT

The ONNX file is self-contained (no external .onnx_data sidecar). The upstream repo ships only model.onnx + .gitattributes; this mirror adds the LICENSE + README.

Variant ladder

Variant	Backbone	Size	Use when…
ViT-S	DINOv2 ViT-Small (~22M backbone params)	~141 MB	CPU / edge / fast-iteration workflows
ViT-B (this)	DINOv2 ViT-Base (~86M)	~419 MB	Recommended default — best quality-per-byte for GPU workloads
ViT-L	DINOv2 ViT-Large (~300M)	~1.32 GB	Peak quality, GPU-comfortable, large enough to push consumer VRAM

All three share the same I/O signature — switch by swapping the file.

Input / output

	Spec
Input	RGB image, NCHW float32, normalized per DINOv2 convention
Outputs	Per-pixel 3D point map (camera-frame), camera intrinsics, per-pixel surface normals
Dynamic axes	Batch + spatial — inspect with Netron for exact names and ranges

The exact input/output tensor names + supported spatial-dim multiples aren't documented at the upstream repo (model.onnx + .gitattributes only). Inspect the graph with Netron before integrating, or cross-reference the microsoft/MoGe PyTorch reference for the preprocessing convention.

When to pick MoGe-2 normal vs alternatives

Need	Pick
Geometry + normals from one forward pass	MoGe-2 normal (this family)
Relative depth only, broadest hardware support	Depth Anything V2/V3
Metric depth in meters, outdoor scenes	Metric3D V2
Surface normals only, smallest model	DSINE
Per-pixel point map only (no normals)	MoGe v1 ViT-L

MoGe-2 normal is the right pick when you're feeding a Poisson surface reconstruction (which wants both positions AND normals at every point), or when downstream rendering needs per-pixel shading normals "for free" alongside depth.

License

MIT — assumed from the sibling Ruicheng/moge-2-vitl-normal-onnx which ships an explicit LICENSE file, plus the upstream microsoft/MoGe code repo being MIT. The upstream ViT-B repo doesn't ship a LICENSE itself; this mirror adds a canonical MIT LICENSE with copyright attributed to Microsoft Research. If the upstream author confirms a different license later, this mirror will follow.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Heliosoph/moge-2-vitb-normal-onnx

Base model

Ruicheng/moge-2-vitb-normal-onnx

Quantized

(1)

this model