DepthCrafter / README.md
sdsdsdadasd3's picture
[Add] Add scripts for preparing benchmark datasets.
c186cfb
|
raw
history blame
4.05 kB

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

   

Wenbo Hu1* †, Xiangjun Gao2*, Xiaoyu Li1* †, Sijie Zhao1, Xiaodong Cun1,
Yong Zhang1, Long Quan2, Ying Shan3, 1


1Tencent AI Lab 2The Hong Kong University of Science and Technology 3ARC Lab, Tencent PCG

arXiv preprint, 2024

πŸ”† Introduction

  • [24-9-19] Add scripts for preparing benchmark datasets.
  • [24-9-18] Add point cloud sequence visualization.
  • [24-9-14] πŸ”₯πŸ”₯πŸ”₯ DepthCrafter is released now, have fun!

πŸ€— DepthCrafter can generate temporally consistent long depth sequences with fine-grained details for open-world videos, without requiring additional information such as camera poses or optical flow.

πŸŽ₯ Visualization

We provide some demos of unprojected point cloud sequences, with reference RGB and estimated depth videos. Please refer to our project page for more details.

https://github.com/user-attachments/assets/62141cc8-04d0-458f-9558-fe50bc04cc21

πŸš€ Quick Start

πŸ› οΈ Installation

  1. Clone this repo:
git clone https://github.com/Tencent/DepthCrafter.git
  1. Install dependencies (please refer to requirements.txt):
pip install -r requirements.txt

πŸ€— Model Zoo

DepthCrafter is available in the Hugging Face Model Hub.

πŸƒβ€β™‚οΈ Inference

1. High-resolution inference, requires a GPU with ~26GB memory for 1024x576 resolution:

  • Full inference (~0.6 fps on A100, recommended for high-quality results):

    python run.py  --video-path examples/example_01.mp4
    
  • Fast inference through 4-step denoising and without classifier-free guidance (~2.3 fps on A100οΌ‰:

    python run.py  --video-path examples/example_01.mp4 --num-inference-steps 4 --guidance-scale 1.0
    

2. Low-resolution inference, requires a GPU with ~9GB memory for 512x256 resolution:

  • Full inference (~2.3 fps on A100):

    python run.py  --video-path examples/example_01.mp4 --max-res 512
    
  • Fast inference through 4-step denoising and without classifier-free guidance (~9.4 fps on A100):

    python run.py  --video-path examples/example_01.mp4  --max-res 512 --num-inference-steps 4 --guidance-scale 1.0
    

πŸ€– Gradio Demo

We provide a local Gradio demo for DepthCrafter, which can be launched by running:

gradio app.py

🀝 Contributing

  • Welcome to open issues and pull requests.
  • Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques.

πŸ“œ Citation

If you find this work helpful, please consider citing:

@article{hu2024-DepthCrafter,
            author      = {Hu, Wenbo and Gao, Xiangjun and Li, Xiaoyu and Zhao, Sijie and Cun, Xiaodong and Zhang, Yong and Quan, Long and Shan, Ying},
            title       = {DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos},
            journal     = {arXiv preprint arXiv:2409.02095},
            year        = {2024}
    }