## ___***DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos***___

_**[Wenbo Hu^{1* †}](https://wbhu.github.io), [Xiangjun Gao^2*](https://scholar.google.com/citations?user=qgdesEcAAAAJ&hl=en), [Xiaoyu Li^{1* †}](https://xiaoyu258.github.io), [Sijie Zhao¹](https://scholar.google.com/citations?user=tZ3dS3MAAAAJ&hl=en), [Xiaodong Cun¹](https://vinthony.github.io/academic),
[Yong Zhang¹](https://yzhang2016.github.io), [Long Quan²](https://home.cse.ust.hk/~quan), [Ying Shan^{3, 1}](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)**_

¹Tencent AI Lab ²The Hong Kong University of Science and Technology ³ARC Lab, Tencent PCG arXiv preprint, 2024

## 🔆 Introduction - [24-9-28] Add full dataset inference and evaluation scripts for better comparison use. :-) - [24-9-25] 🤗🤗🤗 Add huggingface online demo [DepthCrafter](https://huggingface.co/spaces/tencent/DepthCrafter). - [24-9-19] Add scripts for preparing benchmark datasets. - [24-9-18] Add point cloud sequence visualization. - [24-9-14] 🔥🔥🔥 **DepthCrafter** is released now, have fun! 🤗 DepthCrafter can generate temporally consistent long-depth sequences with fine-grained details for open-world videos, without requiring additional information such as camera poses or optical flow. ## 🎥 Visualization We provide some demos of unprojected point cloud sequences, with reference RGB and estimated depth videos. Please refer to our [project page](https://depthcrafter.github.io) for more details. https://github.com/user-attachments/assets/62141cc8-04d0-458f-9558-fe50bc04cc21 ## 🚀 Quick Start ### 🤖 Gradio Demo - Online demo: [DepthCrafter](https://huggingface.co/spaces/tencent/DepthCrafter) - Local demo: ```bash gradio app.py ``` ### 🌟 Community Support - [NukeDepthCrafter](https://github.com/Theo-SAMINADIN-td/NukeDepthCrafter): a plugin allows you to generate temporally consistent Depth sequences inside Nuke, which is widely used in the VFX industry. ### 🛠️ Installation 1. Clone this repo: ```bash git clone https://github.com/Tencent/DepthCrafter.git ``` 2. Install dependencies (please refer to [requirements.txt](requirements.txt)): ```bash pip install -r requirements.txt ``` ### 🤗 Model Zoo [DepthCrafter](https://huggingface.co/tencent/DepthCrafter) is available in the Hugging Face Model Hub. ### 🏃‍♂️ Inference #### 1. High-resolution inference, requires a GPU with ~26GB memory for 1024x576 resolution: - Full inference (~0.6 fps on A100, recommended for high-quality results): ```bash python run.py --video-path examples/example_01.mp4 ``` - Fast inference through 4-step denoising and without classifier-free guidance （~2.3 fps on A100）: ```bash python run.py --video-path examples/example_01.mp4 --num-inference-steps 4 --guidance-scale 1.0 ``` #### 2. Low-resolution inference requires a GPU with ~9GB memory for 512x256 resolution: - Full inference (~2.3 fps on A100): ```bash python run.py --video-path examples/example_01.mp4 --max-res 512 ``` - Fast inference through 4-step denoising and without classifier-free guidance (~9.4 fps on A100): ```bash python run.py --video-path examples/example_01.mp4 --max-res 512 --num-inference-steps 4 --guidance-scale 1.0 ``` ## 🚀 Dataset Evaluation Please check the `benchmark` folder. - To create the dataset we use in the paper, you need to run `dataset_extract/dataset_extract_${dataset_name}.py`. - Then you will get the `csv` files that save the relative root of extracted RGB video and depth npz files. We also provide these csv files. - Inference for all datasets scripts: ```bash bash benchmark/infer/infer.sh ``` (Remember to replace the `input_rgb_root` and `saved_root` with your own path.) - Evaluation for all datasets scripts: ```bash bash benchmark/eval/eval.sh ``` (Remember to replace the `pred_disp_root` and `gt_disp_root` with your own path.) #### ## 🤝 Contributing - Welcome to open issues and pull requests. - Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques. ## 📜 Citation If you find this work helpful, please consider citing: ```bibtex @article{hu2024-DepthCrafter, author = {Hu, Wenbo and Gao, Xiangjun and Li, Xiaoyu and Zhao, Sijie and Cun, Xiaodong and Zhang, Yong and Quan, Long and Shan, Ying}, title = {DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos}, journal = {arXiv preprint arXiv:2409.02095}, year = {2024} } ```