developy commited on
Commit
1eba223
·
verified ·
1 Parent(s): 97280db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +258 -1
README.md CHANGED
@@ -8,4 +8,261 @@ base_model:
8
  - prs-eth/marigold-depth-v1-0
9
  tags:
10
  - code
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - prs-eth/marigold-depth-v1-0
9
  tags:
10
  - code
11
+ ---
12
+ # ApDepth: Aiming for Precise Monocular Depth Estimation Based on Diffusion Models
13
+
14
+ This repository is based on [Marigold](https://marigoldmonodepth.github.io), CVPR 2024 Best Paper: [**Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation**](https://arxiv.org/abs/2312.02145)
15
+
16
+ <!-- [![Website](doc/badges/badge-website.svg)](https://marigoldmonodepth.github.io)
17
+ [![Paper](https://img.shields.io/badge/arXiv-PDF-b31b1b)](https://arxiv.org/abs/2312.02145)
18
+ [![Hugging Face (LCM) Space](https://img.shields.io/badge/🤗%20Hugging%20Face%20(LCM)-Space-yellow)](https://huggingface.co/spaces/prs-eth/marigold-lcm)
19
+ [![Hugging Face (LCM) Model](https://img.shields.io/badge/🤗%20Hugging%20Face%20(LCM)-Model-green)](https://huggingface.co/prs-eth/marigold-lcm-v1-0)
20
+ [![Open In Colab](doc/badges/badge-colab.svg)](https://colab.research.google.com/drive/12G8reD13DdpMie5ZQlaFNo2WCGeNUH-u?usp=sharing) -->
21
+ [![Website](doc/badges/badge-website.svg)](https://haruko386.github.io/research)
22
+ [![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0)
23
+ [![Static Badge](https://img.shields.io/badge/build-Haruko386-brightgreen?style=flat&logo=steam&logoColor=white&logoSize=auto&label=steam&labelColor=black&color=gray&cacheSeconds=3600)](https://steamcommunity.com/profiles/76561198217881431/)
24
+ <!-- [![Hugging Face Model](https://img.shields.io/badge/🤗%20Hugging%20Face-Model-green)](https://huggingface.co/prs-eth/marigold-v1-0) -->
25
+ <!-- [![Website](https://img.shields.io/badge/Project-Website-1081c2)](https://arxiv.org/abs/2312.02145) -->
26
+ <!-- [![GitHub](https://img.shields.io/github/stars/prs-eth/Marigold?style=default&label=GitHub%20★&logo=github)](https://github.com/prs-eth/Marigold) -->
27
+ <!-- [![HF Space](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Space-blue)]() -->
28
+ <!-- [![Docker](doc/badges/badge-docker.svg)]() -->
29
+
30
+ [Haruko386](https://haruko386.github.io/),
31
+ [Shuai Yuan](https://syjz.teacher.360eol.com/teacherBasic/preview?teacherId=23776)
32
+
33
+ ![cover](doc/cover.jpg)
34
+
35
+ >We present **ApDepth**, a diffusion model, and associated fine-tuning protocol for monocular depth estimation. Based on Marigold. Its core innovation lies in addressing the deficiency of diffusion models in feature representation capability. Our model followed Marigold, derived from Stable Diffusion and fine-tuned with synthetic data: Hypersim and VKitti, achieved ideal results in object edge refinement.
36
+
37
+ ## 📢 News
38
+ - 2025-09-23: We change Marigold from `Stochastic multi-step generation` to `Deterministic one-step perception`
39
+ - 2025-08-10: Trying to make some optimizations in Feature Expression<br>
40
+ - 2025-05-08: Clone Marigold to local.<br>
41
+
42
+ ## 🚀 Usage
43
+
44
+ **We offer several ways to interact with Marigold**:
45
+
46
+ 1. A free online interactive demo is available here: <a href="https://huggingface.co/spaces/prs-eth/marigold-lcm"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face%20(LCM)-Space-yellow" height="16"></a> (kudos to the HF team for the GPU grant)
47
+
48
+ 2. If you just want to see the examples, visit our gallery: <a href="https://haruko386.github.io/research"><img src="doc/badges/badge-website.svg" height="16"></a>
49
+
50
+ 3. Local development instructions with this codebase are given below.
51
+
52
+ ## 🛠️ Setup
53
+
54
+ The inference code was tested on:
55
+
56
+ - Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8, GeForce RTX 4090 & GeForce RTX 5080 (pip)
57
+
58
+ ### 🪧 A Note for Windows users
59
+
60
+ We recommend running the code in WSL2:
61
+
62
+ 1. Install WSL following [installation guide](https://learn.microsoft.com/en-us/windows/wsl/install#install-wsl-command).
63
+ 1. Install CUDA support for WSL following [installation guide](https://docs.nvidia.com/cuda/wsl-user-guide/index.html#cuda-support-for-wsl-2).
64
+ 1. Find your drives in `/mnt/<drive letter>/`; check [WSL FAQ](https://learn.microsoft.com/en-us/windows/wsl/faq#how-do-i-access-my-c--drive-) for more details. Navigate to the working directory of choice.
65
+
66
+ ### 📦 Repository
67
+
68
+ Clone the repository (requires git):
69
+
70
+ ```bash
71
+ git clone https://github.com/Haruko386/ApDepth.git
72
+ cd ApDepth
73
+ ```
74
+
75
+ ### 💻 Dependencies
76
+
77
+ **Using Conda:**
78
+ Alternatively, create a Python native virtual environment and install dependencies into it:
79
+
80
+ conda create -n marigold python==3.12.9
81
+ conda activate marigold
82
+ pip install -r requirements.txt
83
+
84
+ Keep the environment activated before running the inference script.
85
+ Activate the environment again after restarting the terminal session.
86
+
87
+ ## 🏃 Testing on your images
88
+
89
+ ### 📷 Prepare images
90
+
91
+ 1. Use selected images from our paper:
92
+
93
+ ```bash
94
+ bash script/download_sample_data.sh
95
+ ```
96
+
97
+ 1. Or place your images in a directory, for example, under `input/in-the-wild_example`, and run the following inference command.
98
+
99
+ ### 🚀 Run inference with LCM (faster)
100
+
101
+ The [LCM checkpoint](https://huggingface.co/prs-eth/marigold-lcm-v1-0) is distilled from our original checkpoint towards faster inference speed (by reducing inference steps). The inference steps can be as few as 1 (default) to 4. Run with default LCM setting:
102
+
103
+ ```bash
104
+ python run.py \
105
+ --input_rgb_dir input/in-the-wild_example \
106
+ --output_dir output/in-the-wild_example_lcm
107
+ ```
108
+
109
+ ### 🎮 Run inference with DDIM (paper setting)
110
+
111
+ This setting corresponds to our paper. For academic comparison, please run with this setting.
112
+
113
+ ```bash
114
+ python run.py \
115
+ --checkpoint prs-eth/marigold-v1-0 \
116
+ --denoise_steps 50 \
117
+ --ensemble_size 10 \
118
+ --input_rgb_dir input/in-the-wild_example \
119
+ --output_dir output/in-the-wild_example
120
+ ```
121
+
122
+ You can find all results in `output/in-the-wild_example`. Enjoy!
123
+
124
+ ### ⚙️ Inference settings
125
+
126
+ The default settings are optimized for the best result. However, the behavior of the code can be customized:
127
+
128
+ - Trade-offs between the **accuracy** and **speed** (for both options, larger values result in better accuracy at the cost of slower inference.)
129
+ - `--ensemble_size`: Number of inference passes in the ensemble. For LCM `ensemble_size` is more important than `denoise_steps`. Default: ~~10~~ 5 (for LCM).
130
+ - `--denoise_steps`: Number of denoising steps of each inference pass. For the original (DDIM) version, it's recommended to use 10-50 steps, while for LCM 1-4 steps. When unassigned (`None`), will read default setting from model config. Default: ~~10 4 (for LCM)~~ `None`.
131
+
132
+ - By default, the inference script resizes input images to the *processing resolution*, and then resizes the prediction back to the original resolution. This gives the best quality, as Stable Diffusion, from which Marigold is derived, performs best at 768x768 resolution.
133
+
134
+ - `--processing_res`: the processing resolution; set as 0 to process the input resolution directly. When unassigned (`None`), will read default setting from model config. Default: ~~768~~ `None`.
135
+ - `--output_processing_res`: produce output at the processing resolution instead of upsampling it to the input resolution. Default: False.
136
+ - `--resample_method`: the resampling method used to resize images and depth predictions. This can be one of `bilinear`, `bicubic`, or `nearest`. Default: `bilinear`.
137
+
138
+ - `--half_precision` or `--fp16`: Run with half-precision (16-bit float) to have faster speed and reduced VRAM usage, but might lead to suboptimal results.
139
+ - `--seed`: Random seed can be set to ensure additional reproducibility. Default: None (unseeded). Note: forcing `--batch_size 1` helps to increase reproducibility. To ensure full reproducibility, [deterministic mode](https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms) needs to be used.
140
+ - `--batch_size`: Batch size of repeated inference. Default: 0 (best value determined automatically).
141
+ - `--color_map`: [Colormap](https://matplotlib.org/stable/users/explain/colors/colormaps.html) used to colorize the depth prediction. Default: Spectral. Set to `None` to skip colored depth map generation.
142
+ - `--apple_silicon`: Use Apple Silicon MPS acceleration.
143
+
144
+ ### ⬇ Checkpoint cache
145
+
146
+ By default, the [checkpoint](https://huggingface.co/prs-eth/marigold-v1-0) is stored in the Hugging Face cache.
147
+ The `HF_HOME` environment variable defines its location and can be overridden, e.g.:
148
+
149
+ ```bash
150
+ export HF_HOME=$(pwd)/cache
151
+ ```
152
+
153
+ Alternatively, use the following script to download the checkpoint weights locally:
154
+
155
+ ```bash
156
+ bash script/download_weights.sh marigold-v1-0
157
+ # or LCM checkpoint
158
+ bash script/download_weights.sh marigold-lcm-v1-0
159
+ ```
160
+
161
+ At inference, specify the checkpoint path:
162
+
163
+ ```bash
164
+ python run.py \
165
+ --checkpoint checkpoint/marigold-v1-0 \
166
+ --denoise_steps 50 \
167
+ --ensemble_size 10 \
168
+ --input_rgb_dir input/in-the-wild_example\
169
+ --output_dir output/in-the-wild_example
170
+ ```
171
+
172
+ ## 🦿 Evaluation on test datasets <a name="evaluation"></a>
173
+
174
+ Install additional dependencies:
175
+
176
+ ```bash
177
+ pip install -r requirements+.txt -r requirements.txt
178
+ ```
179
+
180
+ Set data directory variable (also needed in evaluation scripts) and download [evaluation datasets](https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset) into corresponding subfolders:
181
+
182
+ ```bash
183
+ export BASE_DATA_DIR=<YOUR_DATA_DIR> # Set target data directory
184
+
185
+ wget -r -np -nH --cut-dirs=4 -R "index.html*" -P ${BASE_DATA_DIR} https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/
186
+ ```
187
+
188
+ Run inference and evaluation scripts, for example:
189
+
190
+ ```bash
191
+ # Run inference
192
+ bash script/eval/11_infer_nyu.sh
193
+
194
+ # Evaluate predictions
195
+ bash script/eval/12_eval_nyu.sh
196
+ ```
197
+
198
+ Note: although the seed has been set, the results might still be slightly different on different hardware.
199
+
200
+ ## 🏋️ Training
201
+
202
+ Based on the previously created environment, install extended requirements:
203
+
204
+ ```bash
205
+ pip install -r requirements++.txt -r requirements+.txt -r requirements.txt
206
+ ```
207
+
208
+ Set environment parameters for the data directory:
209
+
210
+ ```bash
211
+ export BASE_DATA_DIR=YOUR_DATA_DIR # directory of training data
212
+ export BASE_CKPT_DIR=YOUR_CHECKPOINT_DIR # directory of pretrained checkpoint
213
+ ```
214
+
215
+ Download Stable Diffusion v2 [checkpoint](https://huggingface.co/stabilityai/stable-diffusion-2) into `${BASE_CKPT_DIR}`
216
+
217
+ Prepare for [Hypersim](https://github.com/apple/ml-hypersim) and [Virtual KITTI 2](https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-2/) datasets and save into `${BASE_DATA_DIR}`. Please refer to [this README](script/dataset_preprocess/hypersim/README.md) for Hypersim preprocessing.
218
+
219
+ Run training script
220
+
221
+ ```bash
222
+ python train.py --config config/train_marigold.yaml --no_wandb
223
+ ```
224
+
225
+ Resume from a checkpoint, e.g.
226
+
227
+ ```bash
228
+ python train.py --resume_run output/train_marigold/checkpoint/latest --no_wandb
229
+ ```
230
+
231
+ Evaluating results
232
+
233
+ Only the U-Net is updated and saved during training. To use the inference pipeline with your training result, replace `unet` folder in Marigold checkpoints with that in the `checkpoint` output folder. Then refer to [this section](#evaluation) for evaluation.
234
+
235
+ **Note**: Although random seeds have been set, the training result might be slightly different on different hardwares. It's recommended to train without interruption.
236
+
237
+ ## ✏️ Contributing
238
+
239
+ Please refer to [this](CONTRIBUTING.md) instruction.
240
+
241
+ ## 🤔 Troubleshooting
242
+
243
+ | Problem | Solution |
244
+ |----------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|
245
+ | (Windows) Invalid DOS bash script on WSL | Run `dos2unix <script_name>` to convert script format |
246
+ | (Windows) error on WSL: `Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory` | Run `export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH` |
247
+
248
+
249
+ ## 🎓 Citation
250
+ Waitting for publishing⏱️
251
+ <!-- Please cite our paper:
252
+
253
+ ```bibtex
254
+ @InProceedings{ke2023repurposing,
255
+ title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
256
+ author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
257
+ booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
258
+ year={2024}
259
+ }
260
+ ``` -->
261
+
262
+ ## 🎫 License
263
+
264
+ This work is licensed under the Apache License, Version 2.0 (as defined in the [LICENSE](LICENSE.txt)).
265
+
266
+ By downloading and using the code and model you agree to the terms in the [LICENSE](LICENSE.txt).
267
+
268
+ [![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0)