DITTO: Diffusion Inference-Time T-Optimization for Music Generation Paper • 2401.12179 • Published Jan 22 • 18
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting Paper • 2404.19758 • Published 23 days ago • 9
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published Apr 18 • 23
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models Paper • 2404.02747 • Published Apr 3 • 11
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control Paper • 2403.09055 • Published Mar 14 • 23
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding Paper • 2403.09530 • Published Mar 14 • 8
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization Paper • 2402.09812 • Published Feb 15 • 11
IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation Paper • 2402.08682 • Published Feb 13 • 12
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation Paper • 2401.15688 • Published Jan 28 • 10
Deconstructing Denoising Diffusion Models for Self-Supervised Learning Paper • 2401.14404 • Published Jan 25 • 16
Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation Paper • 2401.14257 • Published Jan 25 • 9
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild Paper • 2401.13627 • Published Jan 24 • 69
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19 • 53
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers Paper • 2401.08740 • Published Jan 16 • 10
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis Paper • 2401.09048 • Published Jan 17 • 7
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing Paper • 2312.11392 • Published Dec 18, 2023 • 18
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models Paper • 2312.09608 • Published Dec 15, 2023 • 13
PALP: Prompt Aligned Personalization of Text-to-Image Models Paper • 2401.06105 • Published Jan 11 • 46
Splatter Image: Ultra-Fast Single-View 3D Reconstruction Paper • 2312.13150 • Published Dec 20, 2023 • 13
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning Paper • 2312.13980 • Published Dec 21, 2023 • 11
Learning Vision from Models Rivals Learning Vision from Data Paper • 2312.17742 • Published Dec 28, 2023 • 12
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action Paper • 2312.17172 • Published Dec 28, 2023 • 24
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing Paper • 2312.07409 • Published Dec 12, 2023 • 22
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes Paper • 2311.13384 • Published Nov 22, 2023 • 48
SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation Paper • 2303.12236 • Published Mar 21, 2023 • 3
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models Paper • 2307.02421 • Published Jul 5, 2023 • 33