SeonghuJeon commited on
Commit
ddf12a2
·
verified ·
1 Parent(s): eee780c

Add pipeline tag, paper link, citation (incorporating HF community PR #1)

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
  license: apache-2.0
 
3
  tags:
4
  - novel-view-synthesis
5
  - multi-view-diffusion
@@ -11,7 +12,9 @@ tags:
11
 
12
  **Repurposing Geometric Foundation Models for Multi-view Diffusion**
13
 
14
- [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD) | [[arXiv]](https://arxiv.org/abs/2603.22275)
 
 
15
 
16
  ## Quick Start
17
 
@@ -40,3 +43,18 @@ python -c "from huggingface_hub import snapshot_download; snapshot_download('Seo
40
  | `pretrained_models/da3/dpt_decoder.pt` | DPT decoder (depth + geometry) | 0.4G |
41
  | `pretrained_models/mae_decoder.pt` | DA3 MAE decoder (RGB) | 1.6G |
42
  | `pretrained_models/vggt/mae_decoder.pt` | VGGT MAE decoder (RGB) | 1.6G |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-to-3d
4
  tags:
5
  - novel-view-synthesis
6
  - multi-view-diffusion
 
12
 
13
  **Repurposing Geometric Foundation Models for Multi-view Diffusion**
14
 
15
+ [[Paper]](https://huggingface.co/papers/2603.22275) | [[arXiv]](https://arxiv.org/abs/2603.22275) | [[Project Page]](https://cvlab-kaist.github.io/GLD/) | [[Code]](https://github.com/cvlab-kaist/GLD)
16
+
17
+ Geometric Latent Diffusion (GLD) is a framework that repurposes the geometrically consistent feature space of geometric foundation models (such as [Depth Anything 3](https://github.com/DepthAnything/Depth-Anything-3) and [VGGT](https://github.com/facebookresearch/vggt)) as the latent space for multi-view diffusion. By operating in this space rather than a view-independent VAE latent space, GLD achieves consistent novel view synthesis (NVS) and 3D reconstruction with significantly faster training convergence.
18
 
19
  ## Quick Start
20
 
 
43
  | `pretrained_models/da3/dpt_decoder.pt` | DPT decoder (depth + geometry) | 0.4G |
44
  | `pretrained_models/mae_decoder.pt` | DA3 MAE decoder (RGB) | 1.6G |
45
  | `pretrained_models/vggt/mae_decoder.pt` | VGGT MAE decoder (RGB) | 1.6G |
46
+
47
+ ## Citation
48
+
49
+ ```bibtex
50
+ @article{jang2026gld,
51
+ title={Repurposing Geometric Foundation Models for Multi-view Diffusion},
52
+ author={Jang, Wooseok and Jeon, Seonghu and Han, Jisang and Choi, Jinhyeok and Kwon, Minkyung and Kim, Seungryong and Xie, Saining and Liu, Sainan},
53
+ journal={arXiv preprint arXiv:2603.22275},
54
+ year={2026}
55
+ }
56
+ ```
57
+
58
+ ## Acknowledgements
59
+
60
+ Built upon [RAE](https://github.com/nicknign/RAE_release), [Depth Anything 3](https://github.com/DepthAnything/Depth-Anything-3), [VGGT](https://github.com/facebookresearch/vggt), [CUT3R](https://github.com/naver/CUT3R), and [SiT](https://github.com/willisma/SiT).