zach commited on
Commit
321c20c
β€’
1 Parent(s): aa177c6

update metadata

Browse files
Files changed (1) hide show
  1. README.md +22 -133
README.md CHANGED
@@ -1,17 +1,23 @@
 
 
 
 
 
 
 
1
  # πŸš€ Metric3D Project πŸš€
2
 
3
- **Official PyTorch implementation of Metric3Dv1 and Metric3Dv2:**
4
 
5
  [1] [Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image](https://arxiv.org/abs/2307.10984)
6
 
7
  [2] Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
8
 
9
- <a href='https://jugghm.github.io/Metric3Dv2'><img src='https://img.shields.io/badge/project%20page-@Metric3D-yellow.svg'></a>
10
- <a href='https://arxiv.org/abs/2307.10984'><img src='https://img.shields.io/badge/arxiv-@Metric3Dv1-green'></a>
11
- <a href='https:'><img src='https://img.shields.io/badge/arxiv (on hold)-@Metric3Dv2-red'></a>
12
- <a href='https://huggingface.co/spaces/JUGGHM/Metric3D'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
13
-
14
- [//]: # (### [Project Page]&#40;https://arxiv.org/abs/2307.08695&#41; | [v2 Paper]&#40;https://arxiv.org/abs/2307.10984&#41; | [v1 Arxiv]&#40;https://arxiv.org/abs/2307.10984&#41; | [Video]&#40;https://www.youtube.com/playlist?list=PLEuyXJsWqUNd04nwfm9gFBw5FVbcaQPl3&#41; | [Hugging Face πŸ€—]&#40;https://huggingface.co/spaces/JUGGHM/Metric3D&#41; )
15
 
16
  ## News and TO DO LIST
17
 
@@ -20,12 +26,12 @@
20
  - [ ] Focal length free mode
21
  - [ ] Floating noise removing mode
22
  - [ ] Improving HuggingFace Demo and Visualization
23
- - [x] Release training codes
24
-
25
  - `[2024/3/18]` HuggingFace GPU version updated!
26
  - `[2024/3/18]` [Project page](https://jugghm.github.io/Metric3Dv2/) released!
27
  - `[2024/3/18]` Metric3D V2 models released, supporting metric depth and surface normal now!
28
- - `[2023/8/10]` Inference codes, pretrained weights, and demo released.
29
  - `[2023/7]` Metric3D accepted by ICCV 2023!
30
  - `[2023/4]` The Champion of [2nd Monocular Depth Estimation Challenge](https://jspenmar.github.io/MDEC) in CVPR 2023
31
 
@@ -40,29 +46,6 @@ Metric3D is a versatile geometric foundation model for high-quality and zero-sho
40
 
41
  ### Metric Depth
42
 
43
- [//]: # (#### Zero-shot Testing)
44
-
45
- [//]: # (Our models work well on both indoor and outdoor scenarios, compared with other zero-shot metric depth estimation methods.)
46
-
47
- [//]: # ()
48
- [//]: # (| | Backbone | KITTI $\delta 1$ ↑ | KITTI $\delta 2$ ↑ | KITTI $\delta 3$ ↑ | KITTI AbsRel ↓ | KITTI RMSE ↓ | KITTI RMS_log ↓ | NYU $\delta 1$ ↑ | NYU $\delta 2$ ↑ | NYU $\delta 3$ ↑ | NYU AbsRel ↓ | NYU RMSE ↓ | NYU log10 ↓ |)
49
-
50
- [//]: # (|-----------------|------------|--------------------|---------------------|--------------------|-----------------|---------------|------------------|------------------|------------------|------------------|---------------|-------------|--------------|)
51
-
52
- [//]: # (| ZeroDepth | ResNet-18 | 0.910 | 0.980 | 0.996 | 0.057 | 4.044 | 0.083 | 0.901 | 0.961 | - | 0.100 | 0.380 | - |)
53
-
54
- [//]: # (| PolyMax | ConvNeXt-L | - | - | - | - | - | - | 0.969 | 0.996 | 0.999 | 0.067 | 0.250 | 0.033 |)
55
-
56
- [//]: # (| Ours | ViT-L | 0.985 | 0.995 | 0.999 | 0.052 | 2.511 | 0.074 | 0.975 | 0.994 | 0.998 | 0.063 | 0.251 | 0.028 |)
57
-
58
- [//]: # (| Ours | ViT-g2 | 0.989 | 0.996 | 0.999 | 0.051 | 2.403 | 0.080 | 0.980 | 0.997 | 0.999 | 0.067 | 0.260 | 0.030 |)
59
-
60
- [//]: # ()
61
- [//]: # ([//]: # &#40;| Adabins | Efficient-B5 | 0.964 | 0.995 | 0.999 | 0.058 | 2.360 | 0.088 | 0.903 | 0.984 | 0.997 | 0.103 | 0.0444 | 0.364 |&#41;)
62
- [//]: # ([//]: # &#40;| NewCRFs | SwinT-L | 0.974 | 0.997 | 0.999 | 0.052 | 2.129 | 0.079 | 0.922 | 0.983 | 0.994 | 0.095 | 0.041 | 0.334 |&#41;)
63
- [//]: # ([//]: # &#40;| Ours &#40;CSTM_label&#41; | ConvNeXt-L | 0.964 | 0.993 | 0.998 | 0.058 | 2.770 | 0.092 | 0.944 | 0.986 | 0.995 | 0.083 | 0.035 | 0.310 |&#41;)
64
-
65
- [//]: # (#### Finetuned)
66
  Our models rank 1st on the routing KITTI and NYU benchmarks.
67
 
68
  | | Backbone | KITTI Ξ΄1 ↑ | KITTI Ξ΄2 ↑ | KITTI AbsRel ↓ | KITTI RMSE ↓ | KITTI RMS_log ↓ | NYU Ξ΄1 ↑ | NYU Ξ΄2 ↑ | NYU AbsRel ↓ | NYU RMSE ↓ | NYU log10 ↓ |
@@ -111,103 +94,6 @@ Our models also show powerful performance on normal benchmarks.
111
  ### Improving monocular SLAM
112
  <img src="media/gifs/demo_22.gif" width="600" height="337">
113
 
114
- [//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/f95815ef-2506-4193-a6d9-1163ea821268)
115
-
116
- [//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/ed00706c-41cc-49ea-accb-ad0532633cc2)
117
-
118
- [//]: # (### Zero-shot metric 3D recovery)
119
-
120
- [//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/26cd7ae1-dd5a-4446-b275-54c5ca7ef945)
121
-
122
- [//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/21e5484b-c304-4fe3-b1d3-8eebc4e26e42)
123
- [//]: # (### Monocular reconstruction for a Sequence)
124
-
125
- [//]: # ()
126
- [//]: # (### In-the-wild 3D reconstruction)
127
-
128
- [//]: # ()
129
- [//]: # (| | Image | Reconstruction | Pointcloud File |)
130
-
131
- [//]: # (|:---------:|:------------------:|:------------------:|:--------:|)
132
-
133
- [//]: # (| room | <img src="data/wild_demo/jonathan-borba-CnthDZXCdoY-unsplash.jpg" width="300" height="335"> | <img src="media/gifs/room.gif" width="300" height="335"> | [Download]&#40;https://drive.google.com/file/d/1P1izSegH2c4LUrXGiUksw037PVb0hjZr/view?usp=drive_link&#41; |)
134
-
135
- [//]: # (| Colosseum | <img src="data/wild_demo/david-kohler-VFRTXGw1VjU-unsplash.jpg" width="300" height="169"> | <img src="media/gifs/colo.gif" width="300" height="169"> | [Download]&#40;https://drive.google.com/file/d/1jJCXe5IpxBhHDr0TZtNZhjxKTRUz56Hg/view?usp=drive_link&#41; |)
136
-
137
- [//]: # (| chess | <img src="data/wild_demo/randy-fath-G1yhU1Ej-9A-unsplash.jpg" width="300" height="169" align=center> | <img src="media/gifs/chess.gif" width="300" height="169"> | [Download]&#40;https://drive.google.com/file/d/1oV_Foq25_p-tTDRTcyO2AzXEdFJQz-Wm/view?usp=drive_link&#41; |)
138
-
139
- [//]: # ()
140
- [//]: # (All three images are downloaded from [unplash]&#40;https://unsplash.com/&#41; and put in the data/wild_demo directory.)
141
-
142
- [//]: # ()
143
- [//]: # (### 3D metric reconstruction, Metric3D Γ— DroidSLAM)
144
-
145
- [//]: # (Metric3D can also provide scale information for DroidSLAM, help to solve the scale drift problem for better trajectories. )
146
-
147
- [//]: # ()
148
- [//]: # (#### Bird Eyes' View &#40;Left: Droid-SLAM &#40;mono&#41;. Right: Droid-SLAM with Metric-3D&#41;)
149
-
150
- [//]: # ()
151
- [//]: # (<div align=center>)
152
-
153
- [//]: # (<img src="media/gifs/0028.gif"> )
154
-
155
- [//]: # (</div>)
156
-
157
- [//]: # ()
158
- [//]: # (### Front View)
159
-
160
- [//]: # ()
161
- [//]: # (<div align=center>)
162
-
163
- [//]: # (<img src="media/gifs/0028_fv.gif"> )
164
-
165
- [//]: # (</div>)
166
-
167
- [//]: # ()
168
- [//]: # (#### KITTI odemetry evaluation &#40;Translational RMS drift &#40;t_rel, ↓&#41; / Rotational RMS drift &#40;r_rel, ↓&#41;&#41;)
169
-
170
- [//]: # (| | Modality | seq 00 | seq 02 | seq 05 | seq 06 | seq 08 | seq 09 | seq 10 |)
171
-
172
- [//]: # (|:----------:|:--------:|:----------:|:----------:|:---------:|:----------:|:----------:|:---------:|:---------:|)
173
-
174
- [//]: # (| ORB-SLAM2 | Mono | 11.43/0.58 | 10.34/0.26 | 9.04/0.26 | 14.56/0.26 | 11.46/0.28 | 9.3/0.26 | 2.57/0.32 |)
175
-
176
- [//]: # (| Droid-SLAM | Mono | 33.9/0.29 | 34.88/0.27 | 23.4/0.27 | 17.2/0.26 | 39.6/0.31 | 21.7/0.23 | 7/0.25 |)
177
-
178
- [//]: # (| Droid+Ours | Mono | 1.44/0.37 | 2.64/0.29 | 1.44/0.25 | 0.6/0.2 | 2.2/0.3 | 1.63/0.22 | 2.73/0.23 |)
179
-
180
- [//]: # (| ORB-SLAM2 | Stereo | 0.88/0.31 | 0.77/0.28 | 0.62/0.26 | 0.89/0.27 | 1.03/0.31 | 0.86/0.25 | 0.62/0.29 |)
181
-
182
- [//]: # ()
183
- [//]: # (Metric3D makes the mono-SLAM scale-aware, like stereo systems.)
184
-
185
- [//]: # ()
186
- [//]: # (#### KITTI sequence videos - Youtube)
187
-
188
- [//]: # ([2011_09_30_drive_0028]&#40;https://youtu.be/gcTB4MgVCLQ&#41; /)
189
-
190
- [//]: # ([2011_09_30_drive_0033]&#40;https://youtu.be/He581fmoPP4&#41; /)
191
-
192
- [//]: # ([2011_09_30_drive_0034]&#40;https://youtu.be/I3PkukQ3_F8&#41;)
193
-
194
- [//]: # ()
195
- [//]: # (#### Estimated pose)
196
-
197
- [//]: # ([2011_09_30_drive_0033]&#40;https://drive.google.com/file/d/1SMXWzLYrEdmBe6uYMR9ShtDXeFDewChv/view?usp=drive_link&#41; / )
198
-
199
- [//]: # ([2011_09_30_drive_0034]&#40;https://drive.google.com/file/d/1ONU4GxpvTlgW0TjReF1R2i-WFxbbjQPG/view?usp=drive_link&#41; /)
200
-
201
- [//]: # ([2011_10_03_drive_0042]&#40;https://drive.google.com/file/d/19fweg6p1Q6TjJD2KlD7EMA_aV4FIeQUD/view?usp=drive_link&#41;)
202
-
203
- [//]: # ()
204
- [//]: # (#### Pointcloud files)
205
-
206
- [//]: # ([2011_09_30_drive_0033]&#40;https://drive.google.com/file/d/1K0o8DpUmLf-f_rue0OX1VaHlldpHBAfw/view?usp=drive_link&#41; /)
207
-
208
- [//]: # ([2011_09_30_drive_0034]&#40;https://drive.google.com/file/d/1bvZ6JwMRyvi07H7Z2VD_0NX1Im8qraZo/view?usp=drive_link&#41; /)
209
-
210
- [//]: # ([2011_10_03_drive_0042]&#40;https://drive.google.com/file/d/1Vw59F8nN5ApWdLeGKXvYgyS9SNKHKy4x/view?usp=drive_link&#41;)
211
 
212
  ## πŸ”¨ Installation
213
  ### One-line Installation
@@ -263,14 +149,17 @@ Inference settings are defined as
263
  ```
264
  where the images will be first resized as the ```crop_size``` and then fed into the model.
265
 
 
 
 
266
  ## ✈️ Inference
267
  ### Download Checkpoint
268
  | | Encoder | Decoder | Link |
269
  |:----:|:-------------------:|:-----------------:|:-------------------------------------------------------------------------------------------------:|
270
  | v1-T | ConvNeXt-Tiny | Hourglass-Decoder | Coming soon |
271
- | v1-L | ConvNeXt-Large | Hourglass-Decoder | [Download](https://drive.google.com/file/d/1KVINiBkVpJylx_6z1lAC7CQ4kmn-RJRN/view?usp=drive_link) |
272
- | v2-S | DINO2reg-ViT-Small | RAFT-4iter | [Download](https://drive.google.com/file/d/1YfmvXwpWmhLg3jSxnhT7LvY0yawlXcr_/view?usp=drive_link) |
273
- | v2-L | DINO2reg-ViT-Large | RAFT-8iter | [Download](https://drive.google.com/file/d/1eT2gG-kwsVzNy5nJrbm4KC-9DbNKyLnr/view?usp=drive_link) |
274
  | v2-g | DINO2reg-ViT-giant2 | RAFT-8iter | Coming soon |
275
 
276
  ### Dataset Mode
 
1
+ ---
2
+ license: bsd-2-clause
3
+ pipeline_tag: depth-estimation
4
+ tags:
5
+ - Metric Depth
6
+ - Surface Normal
7
+ ---
8
  # πŸš€ Metric3D Project πŸš€
9
 
10
+ **Official Model card of Metric3Dv1 and Metric3Dv2:**
11
 
12
  [1] [Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image](https://arxiv.org/abs/2307.10984)
13
 
14
  [2] Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
15
 
16
+ <a href='https://jugghm.github.io/Metric3Dv2' style='display: inline-block;'><img src='https://img.shields.io/badge/project%20page-@Metric3D-yellow.svg'></a>
17
+ <a href='https://arxiv.org/abs/2307.10984' style='display: inline-block;'><img src='https://img.shields.io/badge/arxiv-@Metric3Dv1-green'></a>
18
+ <a href='https:' style='display: inline-block;'><img src='https://img.shields.io/badge/arxiv (on hold)-@Metric3Dv2-red'></a>
19
+ <a href='https://huggingface.co/spaces/JUGGHM/Metric3D' style='display: inline-block;'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
20
+ <a href='https://huggingface.co/zachL1/Metric3D' style='display: inline-block;'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20card-E0FFFF'></a>
 
21
 
22
  ## News and TO DO LIST
23
 
 
26
  - [ ] Focal length free mode
27
  - [ ] Floating noise removing mode
28
  - [ ] Improving HuggingFace Demo and Visualization
29
+
30
+ - `[2024/4/11]` Training codes are released!
31
  - `[2024/3/18]` HuggingFace GPU version updated!
32
  - `[2024/3/18]` [Project page](https://jugghm.github.io/Metric3Dv2/) released!
33
  - `[2024/3/18]` Metric3D V2 models released, supporting metric depth and surface normal now!
34
+ - `[2023/8/10]` Inference codes, pre-trained weights, and demo released.
35
  - `[2023/7]` Metric3D accepted by ICCV 2023!
36
  - `[2023/4]` The Champion of [2nd Monocular Depth Estimation Challenge](https://jspenmar.github.io/MDEC) in CVPR 2023
37
 
 
46
 
47
  ### Metric Depth
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  Our models rank 1st on the routing KITTI and NYU benchmarks.
50
 
51
  | | Backbone | KITTI Ξ΄1 ↑ | KITTI Ξ΄2 ↑ | KITTI AbsRel ↓ | KITTI RMSE ↓ | KITTI RMS_log ↓ | NYU Ξ΄1 ↑ | NYU Ξ΄2 ↑ | NYU AbsRel ↓ | NYU RMSE ↓ | NYU log10 ↓ |
 
94
  ### Improving monocular SLAM
95
  <img src="media/gifs/demo_22.gif" width="600" height="337">
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
  ## πŸ”¨ Installation
99
  ### One-line Installation
 
149
  ```
150
  where the images will be first resized as the ```crop_size``` and then fed into the model.
151
 
152
+ ## ✈️ Training
153
+ Please refer to [training/README.md](training/README.md)
154
+
155
  ## ✈️ Inference
156
  ### Download Checkpoint
157
  | | Encoder | Decoder | Link |
158
  |:----:|:-------------------:|:-----------------:|:-------------------------------------------------------------------------------------------------:|
159
  | v1-T | ConvNeXt-Tiny | Hourglass-Decoder | Coming soon |
160
+ | v1-L | ConvNeXt-Large | Hourglass-Decoder | [Download](weight/convlarge_hourglass_0.3_150_step750k_v1.1.pth) |
161
+ | v2-S | DINO2reg-ViT-Small | RAFT-4iter | [Download](weight/metric_depth_vit_small_800k.pth) |
162
+ | v2-L | DINO2reg-ViT-Large | RAFT-8iter | [Download](weight/metric_depth_vit_large_800k.pth) |
163
  | v2-g | DINO2reg-ViT-giant2 | RAFT-8iter | Coming soon |
164
 
165
  ### Dataset Mode