SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE Paper • 2411.16856 • Published 26 days ago • 11
Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation Paper • 2303.13873 • Published Mar 24, 2023
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance Paper • 2403.12409 • Published Mar 19 • 9
MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors Paper • 2410.16272 • Published Oct 21
MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era Paper • 2406.09121 • Published Jun 13 • 1
AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention Paper • 2406.12718 • Published Jun 18 • 1
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models Paper • 2411.00492 • Published Nov 1 • 6
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances Paper • 2410.18775 • Published Oct 24 • 9
TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition Paper • 2307.12493 • Published Jul 24, 2023
Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation Paper • 2309.13505 • Published Sep 24, 2023
Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining Paper • 2401.08407 • Published Jan 16
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting Paper • 2311.14521 • Published Nov 24, 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning Paper • 2311.18651 • Published Nov 30, 2023
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis Paper • 2308.11473 • Published Aug 22, 2023
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies Paper • 2403.01422 • Published Mar 3 • 26
Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation Paper • 2404.15506 • Published Mar 22
MeshXL: Neural Coordinate Field for Generative 3D Foundation Models Paper • 2405.20853 • Published May 31
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts Paper • 2406.09162 • Published Jun 13 • 13