zachL1
/

Metric3D

Depth Estimation

Metric Depth

Surface Normal

Model card Files Files and versions Community

zach commited on Apr 17

Commit

321c20c

•

1 Parent(s): aa177c6

update metadata

Browse files

Files changed (1) hide show

README.md +22 -133

README.md CHANGED Viewed

@@ -1,17 +1,23 @@
 # 🚀 Metric3D Project 🚀
-**Official PyTorch implementation of Metric3Dv1 and Metric3Dv2:**
 [1] [Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image](https://arxiv.org/abs/2307.10984)
 [2] Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
-<a href='https://jugghm.github.io/Metric3Dv2'><img src='https://img.shields.io/badge/project%20page-@Metric3D-yellow.svg'></a>
-<a href='https://arxiv.org/abs/2307.10984'><img src='https://img.shields.io/badge/arxiv-@Metric3Dv1-green'></a>
-<a href='https:'><img src='https://img.shields.io/badge/arxiv (on hold)-@Metric3Dv2-red'></a>
-<a href='https://huggingface.co/spaces/JUGGHM/Metric3D'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
-[//]: # (### [Project Page]&#40;https://arxiv.org/abs/2307.08695&#41; | [v2 Paper]&#40;https://arxiv.org/abs/2307.10984&#41; | [v1 Arxiv]&#40;https://arxiv.org/abs/2307.10984&#41; | [Video]&#40;https://www.youtube.com/playlist?list=PLEuyXJsWqUNd04nwfm9gFBw5FVbcaQPl3&#41; | [Hugging Face 🤗]&#40;https://huggingface.co/spaces/JUGGHM/Metric3D&#41; )
 ## News and TO DO LIST
@@ -20,12 +26,12 @@
 - [ ] Focal length free mode
 - [ ] Floating noise removing mode
 - [ ] Improving HuggingFace Demo and Visualization
-- [x] Release training codes
 - `[2024/3/18]` HuggingFace GPU version updated!
 - `[2024/3/18]` [Project page](https://jugghm.github.io/Metric3Dv2/) released!
 - `[2024/3/18]` Metric3D V2 models released, supporting metric depth and surface normal now!
-- `[2023/8/10]` Inference codes, pretrained weights, and demo released.
 - `[2023/7]` Metric3D accepted by ICCV 2023!
 - `[2023/4]` The Champion of [2nd Monocular Depth Estimation Challenge](https://jspenmar.github.io/MDEC) in CVPR 2023
@@ -40,29 +46,6 @@ Metric3D is a versatile geometric foundation model for high-quality and zero-sho
 ### Metric Depth
-[//]: # (#### Zero-shot Testing)
-[//]: # (Our models work well on both indoor and outdoor scenarios, compared with other zero-shot metric depth estimation methods.)
-[//]: # ()
-[//]: # (|                 | Backbone   | KITTI $\delta 1$ ↑ | KITTI $\delta 2$  ↑ | KITTI $\delta 3$ ↑ | KITTI AbsRel  ↓ | KITTI RMSE  ↓ | KITTI RMS_log  ↓ | NYU $\delta 1$ ↑ | NYU $\delta 2$ ↑ | NYU $\delta 3$ ↑ | NYU AbsRel  ↓ | NYU RMSE  ↓ | NYU log10  ↓ |)
-[//]: # (|-----------------|------------|--------------------|---------------------|--------------------|-----------------|---------------|------------------|------------------|------------------|------------------|---------------|-------------|--------------|)
-[//]: # (| ZeroDepth       | ResNet-18 | 0.910              | 0.980               | 0.996              | 0.057           | 4.044         | 0.083            | 0.901            | 0.961            | -                | 0.100         | 0.380       | -            |)
-[//]: # (| PolyMax         | ConvNeXt-L    | -                  | -                   | -                  | -               | -             | -                | 0.969            | 0.996            | 0.999            | 0.067         | 0.250       | 0.033        |)
-[//]: # (| Ours | ViT-L     | 0.985              | 0.995               | 0.999              | 0.052           | 2.511         | 0.074            | 0.975            | 0.994            | 0.998            | 0.063         | 0.251       | 0.028        |)
-[//]: # (| Ours | ViT-g2    | 0.989              | 0.996               | 0.999              | 0.051           | 2.403         | 0.080            | 0.980            | 0.997            | 0.999            | 0.067         | 0.260       | 0.030        |)
-[//]: # ()
-[//]: # ([//]: # &#40;| Adabins | Efficient-B5 | 0.964 | 0.995 | 0.999 | 0.058  |  2.360 | 0.088            | 0.903  | 0.984  | 0.997  | 0.103  | 0.0444  | 0.364 |&#41;)
-[//]: # ([//]: # &#40;| NewCRFs | SwinT-L | 0.974 | 0.997 | 0.999 | 0.052  |  2.129 | 0.079            | 0.922  | 0.983  | 0.994  | 0.095  | 0.041  | 0.334 |&#41;)
-[//]: # ([//]: # &#40;| Ours &#40;CSTM_label&#41; | ConvNeXt-L |      0.964      | 0.993   | 0.998  | 0.058 | 2.770  | 0.092            | 0.944  |  0.986 | 0.995   | 0.083  |  0.035 |  0.310 |&#41;)
-[//]: # (#### Finetuned)
 Our models rank 1st on the routing KITTI and NYU benchmarks.
 |               | Backbone    | KITTI δ1 ↑ | KITTI δ2  ↑  | KITTI AbsRel  ↓ | KITTI RMSE  ↓ | KITTI RMS_log  ↓ | NYU δ1 ↑ | NYU δ2 ↑  | NYU AbsRel  ↓ | NYU RMSE  ↓ | NYU log10  ↓ |
@@ -111,103 +94,6 @@ Our models also show powerful performance on normal benchmarks.
 ### Improving monocular SLAM
 <img src="media/gifs/demo_22.gif" width="600" height="337">
-[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/f95815ef-2506-4193-a6d9-1163ea821268)
-[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/ed00706c-41cc-49ea-accb-ad0532633cc2)
-[//]: # (### Zero-shot metric 3D recovery)
-[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/26cd7ae1-dd5a-4446-b275-54c5ca7ef945)
-[//]: # (https://github.com/YvanYin/Metric3D/assets/35299633/21e5484b-c304-4fe3-b1d3-8eebc4e26e42)
-[//]: # (### Monocular reconstruction for a Sequence)
-[//]: # ()
-[//]: # (### In-the-wild 3D reconstruction)
-[//]: # ()
-[//]: # (|           | Image | Reconstruction | Pointcloud File |)
-[//]: # (|:---------:|:------------------:|:------------------:|:--------:|)
-[//]: # (|    room   |    <img src="data/wild_demo/jonathan-borba-CnthDZXCdoY-unsplash.jpg" width="300" height="335">     |     <img src="media/gifs/room.gif" width="300" height="335">            |  [Download]&#40;https://drive.google.com/file/d/1P1izSegH2c4LUrXGiUksw037PVb0hjZr/view?usp=drive_link&#41;        |)
-[//]: # (| Colosseum |    <img src="data/wild_demo/david-kohler-VFRTXGw1VjU-unsplash.jpg" width="300" height="169">     |     <img src="media/gifs/colo.gif" width="300" height="169">         |     [Download]&#40;https://drive.google.com/file/d/1jJCXe5IpxBhHDr0TZtNZhjxKTRUz56Hg/view?usp=drive_link&#41;     |)
-[//]: # (|   chess   |    <img src="data/wild_demo/randy-fath-G1yhU1Ej-9A-unsplash.jpg" width="300" height="169" align=center>     |     <img src="media/gifs/chess.gif" width="300" height="169">            |      [Download]&#40;https://drive.google.com/file/d/1oV_Foq25_p-tTDRTcyO2AzXEdFJQz-Wm/view?usp=drive_link&#41;    |)
-[//]: # ()
-[//]: # (All three images are downloaded from [unplash]&#40;https://unsplash.com/&#41; and put in the data/wild_demo directory.)
-[//]: # ()
-[//]: # (### 3D metric reconstruction, Metric3D × DroidSLAM)
-[//]: # (Metric3D can also provide scale information for DroidSLAM, help to solve the scale drift problem for better trajectories. )
-[//]: # ()
-[//]: # (#### Bird Eyes' View &#40;Left: Droid-SLAM &#40;mono&#41;. Right: Droid-SLAM with Metric-3D&#41;)
-[//]: # ()
-[//]: # (<div align=center>)
-[//]: # (<img src="media/gifs/0028.gif"> )
-[//]: # (</div>)
-[//]: # ()
-[//]: # (### Front View)
-[//]: # ()
-[//]: # (<div align=center>)
-[//]: # (<img src="media/gifs/0028_fv.gif"> )
-[//]: # (</div>)
-[//]: # ()
-[//]: # (#### KITTI odemetry evaluation &#40;Translational RMS drift &#40;t_rel, ↓&#41; / Rotational RMS drift &#40;r_rel, ↓&#41;&#41;)
-[//]: # (|            | Modality |   seq 00   |   seq 02   |   seq 05  |   seq 06   |   seq 08   |   seq 09  |   seq 10  |)
-[//]: # (|:----------:|:--------:|:----------:|:----------:|:---------:|:----------:|:----------:|:---------:|:---------:|)
-[//]: # (|  ORB-SLAM2 |   Mono   | 11.43/0.58 | 10.34/0.26 | 9.04/0.26 | 14.56/0.26 | 11.46/0.28 |  9.3/0.26 | 2.57/0.32 |)
-[//]: # (| Droid-SLAM |   Mono   |  33.9/0.29 | 34.88/0.27 | 23.4/0.27 |  17.2/0.26 |  39.6/0.31 | 21.7/0.23 |   7/0.25  |)
-[//]: # (| Droid+Ours |   Mono   |  1.44/0.37 |  2.64/0.29 | 1.44/0.25 |   0.6/0.2  |   2.2/0.3  | 1.63/0.22 | 2.73/0.23 |)
-[//]: # (|  ORB-SLAM2 |  Stereo  |  0.88/0.31 |  0.77/0.28 | 0.62/0.26 |  0.89/0.27 |  1.03/0.31 | 0.86/0.25 | 0.62/0.29 |)
-[//]: # ()
-[//]: # (Metric3D makes the mono-SLAM scale-aware, like stereo systems.)
-[//]: # ()
-[//]: # (#### KITTI sequence videos - Youtube)
-[//]: # ([2011_09_30_drive_0028]&#40;https://youtu.be/gcTB4MgVCLQ&#41; /)
-[//]: # ([2011_09_30_drive_0033]&#40;https://youtu.be/He581fmoPP4&#41; /)
-[//]: # ([2011_09_30_drive_0034]&#40;https://youtu.be/I3PkukQ3_F8&#41;)
-[//]: # ()
-[//]: # (#### Estimated pose)
-[//]: # ([2011_09_30_drive_0033]&#40;https://drive.google.com/file/d/1SMXWzLYrEdmBe6uYMR9ShtDXeFDewChv/view?usp=drive_link&#41; / )
-[//]: # ([2011_09_30_drive_0034]&#40;https://drive.google.com/file/d/1ONU4GxpvTlgW0TjReF1R2i-WFxbbjQPG/view?usp=drive_link&#41; /)
-[//]: # ([2011_10_03_drive_0042]&#40;https://drive.google.com/file/d/19fweg6p1Q6TjJD2KlD7EMA_aV4FIeQUD/view?usp=drive_link&#41;)
-[//]: # ()
-[//]: # (#### Pointcloud files)
-[//]: # ([2011_09_30_drive_0033]&#40;https://drive.google.com/file/d/1K0o8DpUmLf-f_rue0OX1VaHlldpHBAfw/view?usp=drive_link&#41; /)
-[//]: # ([2011_09_30_drive_0034]&#40;https://drive.google.com/file/d/1bvZ6JwMRyvi07H7Z2VD_0NX1Im8qraZo/view?usp=drive_link&#41; /)
-[//]: # ([2011_10_03_drive_0042]&#40;https://drive.google.com/file/d/1Vw59F8nN5ApWdLeGKXvYgyS9SNKHKy4x/view?usp=drive_link&#41;)
 ## 🔨 Installation
 ### One-line Installation
@@ -263,14 +149,17 @@ Inference settings are defined as
 ```
 where the images will be first resized as the ```crop_size``` and then fed into the model.
 ## ✈️ Inference
 ### Download Checkpoint
 |      |       Encoder       |      Decoder      |                                               Link                                                |
 |:----:|:-------------------:|:-----------------:|:-------------------------------------------------------------------------------------------------:|
 | v1-T |    ConvNeXt-Tiny    | Hourglass-Decoder |                                            Coming soon                                            |
-| v1-L |   ConvNeXt-Large    | Hourglass-Decoder | [Download](https://drive.google.com/file/d/1KVINiBkVpJylx_6z1lAC7CQ4kmn-RJRN/view?usp=drive_link) |
-| v2-S | DINO2reg-ViT-Small  |    RAFT-4iter     | [Download](https://drive.google.com/file/d/1YfmvXwpWmhLg3jSxnhT7LvY0yawlXcr_/view?usp=drive_link) |
-| v2-L | DINO2reg-ViT-Large  |    RAFT-8iter     | [Download](https://drive.google.com/file/d/1eT2gG-kwsVzNy5nJrbm4KC-9DbNKyLnr/view?usp=drive_link) |
 | v2-g | DINO2reg-ViT-giant2 |    RAFT-8iter     | Coming soon |
 ### Dataset Mode

+---
+license: bsd-2-clause
+pipeline_tag: depth-estimation
+tags:
+- Metric Depth
+- Surface Normal
+---
 # 🚀 Metric3D Project 🚀
+**Official Model card of Metric3Dv1 and Metric3Dv2:**
 [1] [Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image](https://arxiv.org/abs/2307.10984)
 [2] Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
+<a href='https://jugghm.github.io/Metric3Dv2' style='display: inline-block;'><img src='https://img.shields.io/badge/project%20page-@Metric3D-yellow.svg'></a>
+<a href='https://arxiv.org/abs/2307.10984' style='display: inline-block;'><img src='https://img.shields.io/badge/arxiv-@Metric3Dv1-green'></a>
+<a href='https:' style='display: inline-block;'><img src='https://img.shields.io/badge/arxiv (on hold)-@Metric3Dv2-red'></a>
+<a href='https://huggingface.co/spaces/JUGGHM/Metric3D' style='display: inline-block;'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
+<a href='https://huggingface.co/zachL1/Metric3D' style='display: inline-block;'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20card-E0FFFF'></a>
 ## News and TO DO LIST
 - [ ] Focal length free mode
 - [ ] Floating noise removing mode
 - [ ] Improving HuggingFace Demo and Visualization
+- `[2024/4/11]` Training codes are released!
 - `[2024/3/18]` HuggingFace GPU version updated!
 - `[2024/3/18]` [Project page](https://jugghm.github.io/Metric3Dv2/) released!
 - `[2024/3/18]` Metric3D V2 models released, supporting metric depth and surface normal now!
+- `[2023/8/10]` Inference codes, pre-trained weights, and demo released.
 - `[2023/7]` Metric3D accepted by ICCV 2023!
 - `[2023/4]` The Champion of [2nd Monocular Depth Estimation Challenge](https://jspenmar.github.io/MDEC) in CVPR 2023
 ### Metric Depth
 Our models rank 1st on the routing KITTI and NYU benchmarks.
 |               | Backbone    | KITTI δ1 ↑ | KITTI δ2  ↑  | KITTI AbsRel  ↓ | KITTI RMSE  ↓ | KITTI RMS_log  ↓ | NYU δ1 ↑ | NYU δ2 ↑  | NYU AbsRel  ↓ | NYU RMSE  ↓ | NYU log10  ↓ |
 ### Improving monocular SLAM
 <img src="media/gifs/demo_22.gif" width="600" height="337">
 ## 🔨 Installation
 ### One-line Installation
 ```
 where the images will be first resized as the ```crop_size``` and then fed into the model.
+## ✈️ Training
+Please refer to [training/README.md](training/README.md)
 ## ✈️ Inference
 ### Download Checkpoint
 |      |       Encoder       |      Decoder      |                                               Link                                                |
 |:----:|:-------------------:|:-----------------:|:-------------------------------------------------------------------------------------------------:|
 | v1-T |    ConvNeXt-Tiny    | Hourglass-Decoder |                                            Coming soon                                            |
+| v1-L |   ConvNeXt-Large    | Hourglass-Decoder | [Download](weight/convlarge_hourglass_0.3_150_step750k_v1.1.pth) |
+| v2-S | DINO2reg-ViT-Small  |    RAFT-4iter     | [Download](weight/metric_depth_vit_small_800k.pth) |
+| v2-L | DINO2reg-ViT-Large  |    RAFT-8iter     | [Download](weight/metric_depth_vit_large_800k.pth) |
 | v2-g | DINO2reg-ViT-giant2 |    RAFT-8iter     | Coming soon |
 ### Dataset Mode