amael-apple commited on
Commit
41f26b3
·
verified ·
1 Parent(s): 19ad76e

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +135 -3
  2. config.json +7 -0
  3. model.safetensors +3 -0
README.md CHANGED
@@ -1,3 +1,135 @@
1
- ---
2
- license: apple-ascl
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apple-ascl
3
+ pipeline_tag: depth-estimation
4
+ tags:
5
+ - model_hub_mixin
6
+ - pytorch_model_hub_mixin
7
+ ---
8
+
9
+ # Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
10
+
11
+ ![Depth Pro Demo Image](https://github.com/apple/ml-depth-pro/raw/main/data/depth-pro-teaser.jpg)
12
+
13
+ We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image.
14
+
15
+ Depth Pro was introduced in **[Depth Pro: Sharp Monocular Metric Depth in Less Than a Second](https://arxiv.org/abs/2410.02073)**, by *Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun*.
16
+
17
+ The checkpoint in this repository is a reference implementation, which has been re-trained. Its performance is close to the model reported in the paper but does not match it exactly.
18
+
19
+ ## How to Use
20
+
21
+ Please, follow the steps in the [code repository](https://github.com/apple/ml-depth-pro) to set up your environment. Then you can:
22
+
23
+ ### Running from Python
24
+
25
+ ```python
26
+ from huggingface_hub import PyTorchModelHubMixin
27
+ from depth_pro import create_model_and_transforms, load_rgb
28
+ from depth_pro.depth_pro import (create_backbone_model, load_monodepth_weights,
29
+ DepthPro, DepthProEncoder, MultiresConvDecoder)
30
+ import depth_pro
31
+ from torchvision.transforms import Compose, Normalize
32
+
33
+
34
+ class DepthProWrapper(DepthPro, PyTorchModelHubMixin):
35
+ """Depth Pro network."""
36
+
37
+ def __init__(
38
+ self,
39
+ patch_encoder_preset: str,
40
+ image_encoder_preset: str,
41
+ decoder_features: str,
42
+ fov_encoder_preset: str,
43
+ use_fov_head: bool = True,
44
+ **kwargs,
45
+ ):
46
+ """Initialize Depth Pro."""
47
+
48
+ patch_encoder, patch_encoder_config = create_backbone_model(
49
+ preset=patch_encoder_preset
50
+ )
51
+ image_encoder, _ = create_backbone_model(
52
+ preset=image_encoder_preset
53
+ )
54
+
55
+ fov_encoder = None
56
+ if use_fov_head and fov_encoder_preset is not None:
57
+ fov_encoder, _ = create_backbone_model(preset=fov_encoder_preset)
58
+
59
+ dims_encoder = patch_encoder_config.encoder_feature_dims
60
+ hook_block_ids = patch_encoder_config.encoder_feature_layer_ids
61
+ encoder = DepthProEncoder(
62
+ dims_encoder=dims_encoder,
63
+ patch_encoder=patch_encoder,
64
+ image_encoder=image_encoder,
65
+ hook_block_ids=hook_block_ids,
66
+ decoder_features=decoder_features,
67
+ )
68
+ decoder = MultiresConvDecoder(
69
+ dims_encoder=[encoder.dims_encoder[0]] + list(encoder.dims_encoder),
70
+ dim_decoder=decoder_features,
71
+ )
72
+
73
+ super().__init__(
74
+ encoder=encoder,
75
+ decoder=decoder,
76
+ last_dims=(32, 1),
77
+ use_fov_head=use_fov_head,
78
+ fov_encoder=fov_encoder,
79
+ )
80
+
81
+
82
+ # Load model and preprocessing transform
83
+ model = DepthProWrapper.from_pretrained("DepthPro-L")
84
+ transform = Compose(
85
+ [
86
+ ToTensor(),
87
+ Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]),
88
+ ]
89
+ )
90
+
91
+
92
+ model.eval()
93
+
94
+ # Load and preprocess an image.
95
+ image, _, f_px = depth_pro.load_rgb(image_path)
96
+ image = transform(image)
97
+
98
+ # Run inference.
99
+ prediction = model.infer(image, f_px=f_px)
100
+ depth = prediction["depth"] # Depth in [m].
101
+ focallength_px = prediction["focallength_px"] # Focal length in pixels.
102
+ ```
103
+
104
+ ### Evaluation (boundary metrics)
105
+
106
+ Boundary metrics are implemented in `eval/boundary_metrics.py` and can be used as follows:
107
+
108
+ ```python
109
+ # for a depth-based dataset
110
+ boundary_f1 = SI_boundary_F1(predicted_depth, target_depth)
111
+
112
+ # for a mask-based dataset (image matting / segmentation)
113
+ boundary_recall = SI_boundary_Recall(predicted_depth, target_mask)
114
+ ```
115
+
116
+
117
+ ## Citation
118
+
119
+ If you find our work useful, please cite the following paper:
120
+
121
+ ```bibtex
122
+ @article{Bochkovskii2024:arxiv,
123
+ author = {Aleksei Bochkovskii and Ama\"{e}l Delaunoy and Hugo Germain and Marcel Santos and
124
+ Yichao Zhou and Stephan R. Richter and Vladlen Koltun}
125
+ title = {Depth Pro: Sharp Monocular Metric Depth in Less Than a Second},
126
+ journal = {arXiv},
127
+ year = {2024},
128
+ }
129
+ ```
130
+
131
+ ## Acknowledgements
132
+
133
+ Our codebase is built using multiple opensource contributions, please see [Acknowledgements](https://github.com/apple/ml-depth-pro/blob/main/ACKNOWLEDGEMENTS.md) for more details.
134
+
135
+ Please check the paper for a complete list of references and datasets used in this work.
config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "decoder_features": 256,
3
+ "fov_encoder_preset": "dinov2l16_384",
4
+ "image_encoder_preset": "dinov2l16_384",
5
+ "patch_encoder_preset": "dinov2l16_384",
6
+ "use_fov_head": true
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8cf414ab41135c007626ebde7013252279628de1de2bc9579cce5bc49127d33f
3
+ size 1904109940