DA3-LARGE — CoreML (.mlpackage) for monocular depth

A precompiled Core ML conversion of Depth Anything 3 — DA3-LARGE (the full ViT-L model), exposing a single-image relative-depth output for use in macOS/iOS apps (e.g. the Sbs SBS-3D viewer).

Input: image, RGB, 504×504, [0,1] (CoreML ImageType; ImageNet norm baked in).
Output: depth, shape (1, 504, 504), single-channel relative depth.
Weights: FP16. Only the backbone → depth head is converted; DA3's camera / sky / Gaussian-splat post-processing is bypassed (not needed for monocular depth).
Conversion notes: the full DA3-LARGE backbone uses RoPE + multi-view camera tokens, so four ops were rewritten for coremltools compatibility (RoPE cartesian_prod, in-place camera-token insert, single-view empty source-token concat, and the head's meshgrid). Behaviour is unchanged for single-image input.

License & attribution

Derived from depth-anything/DA3-LARGE (Depth Anything 3, arXiv:2511.10647), licensed CC-BY-NC-4.0. This conversion is released under the same CC-BY-NC-4.0 license: attribution required, non-commercial use only. Do not use commercially.

For commercial use, see the Apache-2.0 monocular distillation depth-anything/DA3MONO-LARGE (CoreML build).

Downloads last month: 5

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sdkv2/DA3-LARGE-CoreML

Base model

depth-anything/DA3-LARGE

Quantized

(1)

this model

Paper for sdkv2/DA3-LARGE-CoreML

Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13, 2025 • 102