GLD / README.md

Add pipeline tag, paper link, citation (incorporating HF community PR #1)

ddf12a2 verified about 7 hours ago

2.53 kB

license: apache-2.0
pipeline_tag: image-to-3d
tags:
  - novel-view-synthesis
  - multi-view-diffusion
  - depth-estimation
  - 3d-reconstruction

GLD: Geometric Latent Diffusion

Repurposing Geometric Foundation Models for Multi-view Diffusion

[Paper] | [arXiv] | [Project Page] | [Code]

Geometric Latent Diffusion (GLD) is a framework that repurposes the geometrically consistent feature space of geometric foundation models (such as Depth Anything 3 and VGGT) as the latent space for multi-view diffusion. By operating in this space rather than a view-independent VAE latent space, GLD achieves consistent novel view synthesis (NVS) and 3D reconstruction with significantly faster training convergence.

Quick Start

git clone https://github.com/cvlab-kaist/GLD.git
cd GLD
conda env create -f environment.yml
conda activate gld

# Download all checkpoints
python -c "from huggingface_hub import snapshot_download; snapshot_download('SeonghuJeon/GLD', local_dir='.')"

# Run demo
./run_demo.sh da3

Files

File	Description	Size
`checkpoints/da3_level1.pt`	DA3 Level-1 diffusion	3.0G
`checkpoints/da3_cascade.pt`	DA3 Cascade (L1→L0)	1.8G
`checkpoints/vggt_level1.pt`	VGGT Level-1 diffusion	3.1G
`checkpoints/vggt_cascade.pt`	VGGT Cascade (L1→L0)	3.1G
`pretrained_models/da3/model.safetensors`	DA3-Base encoder	0.5G
`pretrained_models/da3/dpt_decoder.pt`	DPT decoder (depth + geometry)	0.4G
`pretrained_models/mae_decoder.pt`	DA3 MAE decoder (RGB)	1.6G
`pretrained_models/vggt/mae_decoder.pt`	VGGT MAE decoder (RGB)	1.6G

Citation

@article{jang2026gld,
  title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
  author={Jang, Wooseok and Jeon, Seonghu and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
  journal={arXiv preprint arXiv:2603.22275},
  year={2026}
}

Acknowledgements

Built upon RAE, Depth Anything 3, VGGT, CUT3R, and SiT.