Depth Anything 3: Recovering the Visual Space from Any Views
Paper β’ 2511.10647 β’ Published β’ 102
A precompiled Core ML conversion of Depth Anything 3 β DA3-GIANT (the full ViT-g / 1.15B model), exposing a single-image relative-depth output for macOS/iOS.
image, RGB, 504Γ504, [0,1] (CoreML ImageType; ImageNet norm baked in).depth, shape (1, 504, 504), single-channel relative depth.int() on size-1 non-0-dim arrays, which breaks the const-cast of H//patch_size).
Single-image behaviour is unchanged.This is the highest-capacity DA3 depth variant, and correspondingly the slowest at inference β for real-time monocular depth the smaller DA3MONO-LARGE / DA3-LARGE are usually the better trade-off.
Derived from depth-anything/DA3-GIANT (Depth Anything 3, arXiv:2511.10647), CC-BY-NC-4.0. Released under the same license: attribution required, non-commercial use only. For commercial use, see the Apache-2.0 depth-anything/DA3MONO-LARGE.
Base model
depth-anything/DA3-GIANT