SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
Abstract
We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. Regression methods efficiently infer visible surfaces, but struggle with occluded regions. Generative methods handle uncertain regions better by modeling distributions, but are computationally expensive and the generation is often misaligned with visible surfaces. In this paper, we present SPAR3D, a novel two-stage approach aiming to take the best of both directions. The first stage of SPAR3D generates sparse 3D point clouds using a lightweight point diffusion model, which has a fast sampling speed. The second stage uses both the sampled point cloud and the input image to create highly detailed meshes. Our two-stage design enables probabilistic modeling of the ill-posed single-image 3D task while maintaining high computational efficiency and great output fidelity. Using point clouds as an intermediate representation further allows for interactive user edits. Evaluated on diverse datasets, SPAR3D demonstrates superior performance over previous state-of-the-art methods, at an inference speed of 0.7 seconds. Project page with code and model: https://spar3d.github.io
Community
Combines point cloud diffusion with regression based mesh estimation to support extremely fast generation speeds. Also comes with HF model and demo release
Thank you for sharing this interesting work. The two stage design of SPAR3D offers a clever approach to the back surface ambiguity in single image reconstruction. I also appreciate the emphasis on fast inference and interactive editing. It is refreshing to see a balance of efficiency and practicality. Looking ahead, I am interested in how future iterations might refine the material and lighting estimation, perhaps through subtle priors or semi supervised strategies. That deeper disentangling of albedo and illumination could further elevate its realism. Congratulations on a thoughtful contribution.
Thanks for the comment :). Speed was highly important to me for easy prototyping and fast iteration times. We're currently investigating how to push this paradigm further. Stay tuned :)
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ARM: Appearance Reconstruction Model for Relightable 3D Generation (2024)
- GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation (2024)
- PRM: Photometric Stereo based Large Reconstruction Model (2024)
- Wonderland: Navigating 3D Scenes from a Single Image (2024)
- Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation (2024)
- Boosting 3D object generation through PBR materials (2024)
- Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper