GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
Tian-Xing Xu1,
Xiangjun Gao3,
Wenbo Hu2 β ,
Xiaoyu Li2,
Song-Hai Zhang1 β ,
Ying Shan2
1Tsinghua University
2ARC Lab, Tencent PCG
3HKUST
π Notice
GeometryCrafter is still under active development!
We recommend that everyone use English to communicate on issues, as this helps developers from around the world discuss, share experiences, and answer questions together. For further implementation details, please contact xutx21@mails.tsinghua.edu.cn
. For business licensing and other related inquiries, don't hesitate to contact wbhu@tencent.com
.
If you find GeometryCrafter useful, please help β this repo, which is important to Open-Source projects. Thanks!
π Introduction
We present GeometryCrafter, a novel approach that estimates temporally consistent, high-quality point maps from open-world videos, facilitating downstream applications such as 3D/4D reconstruction and depth-based video editing or generation.
Release Notes:
[01/04/2025]
π₯π₯π₯GeometryCrafter is released now, have fun!
π Quick Start
Installation
- Clone this repo:
git clone --recursive https://github.com/TencentARC/GeometryCrafter
- Install dependencies (please refer to requirements.txt):
pip install -r requirements.txt
Inference
Run inference code on our provided demo videos at 1.27FPS, which requires a GPU with ~40GB memory for 110 frames with 1024x576 resolution:
python run.py \
--video_path examples/video1.mp4 \
--save_folder workspace/examples_output \
--height 576 --width 1024
# resize the input video to the target resolution for processing, which should be divided by 64
# the output point maps will be restored to the original resolution before saving
# you can use --downsample_ratio to downsample the input video or reduce --decode_chunk_size to save the memory usage
Run inference code with our deterministic variant at 1.50 FPS
python run.py \
--video_path examples/video1.mp4 \
--save_folder workspace/examples_output \
--height 576 --width 1024 \
--model_type determ
Run low-resolution processing at 2.49 FPS, which requires a GPU with ~22GB memory:
python run.py \
--video_path examples/video1.mp4 \
--save_folder workspace/examples_output \
--height 384 --width 640
Visualization
Visualize the predicted point maps with Viser
python visualize/vis_point_maps.py \
--video_path examples/video1.mp4 \
--data_path workspace/examples_output/video1.npz
π€ Gradio Demo
- Online demo: GeometryCrafter
- Local demo:
gradio app.py
π Dataset Evaluation
Please check the evaluation
folder.
- To create the dataset we use in the paper, you need to run
evaluation/preprocess/gen_{dataset_name}.py
. - You need to change
DATA_DIR
andOUTPUT_DIR
first accordint to your working environment. - Then you will get the preprocessed datasets containing extracted RGB video and point map npz files. We also provide the catelog of these files.
- Inference for all datasets scripts:
(Remember to replace thebash evaluation/run_batch.sh
data_root_dir
andsave_root_dir
with your path.) - Evaluation for all datasets scripts (scale-invariant point map estimation):
(Remember to replace thebash evaluation/eval.sh
pred_data_root_dir
andgt_data_root_dir
with your path.) - Evaluation for all datasets scripts (affine-invariant depth estimation):
(Remember to replace thebash evaluation/eval_depth.sh
pred_data_root_dir
andgt_data_root_dir
with your path.) - We also provide the comparison results of MoGe and the deterministic variant of our method. You can evaluate these methods under the same protocol by uncomment the corresponding lines in
evaluation/run.sh
evaluation/eval.sh
evaluation/run_batch.sh
andevaluation/eval_depth.sh
.
π€ Contributing
- Welcome to open issues and pull requests.
- Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques.
- Downloads last month
- 72