StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published 10 days ago • 40
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting Paper • 2404.19758 • Published 12 days ago • 9
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance Paper • 2401.16465 • Published Jan 29 • 7
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published 12 days ago • 62
Interactive3D: Create What You Want by Interactive 3D Generation Paper • 2404.16510 • Published 17 days ago • 17
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published 20 days ago • 21
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published 19 days ago • 119
Long-form music generation with latent diffusion Paper • 2404.10301 • Published 26 days ago • 22
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper • 2404.14396 • Published 20 days ago • 16
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published 21 days ago • 25
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation Paper • 2404.11565 • Published 25 days ago • 12
MeshLRM: Large Reconstruction Model for High-Quality Mesh Paper • 2404.12385 • Published 24 days ago • 23
Scaling Instructable Agents Across Many Simulated Worlds Paper • 2404.10179 • Published Mar 13 • 23
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published 27 days ago • 10
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published 27 days ago • 20
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published 27 days ago • 27
TransformerFAM: Feedback attention is working memory Paper • 2404.09173 • Published 28 days ago • 42
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies Paper • 2404.08197 • Published about 1 month ago • 26
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published about 1 month ago • 28
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published about 1 month ago • 40
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published about 1 month ago • 45
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published Apr 10 • 21
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting Paper • 2404.06903 • Published Apr 10 • 14
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 92
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations Paper • 2404.04421 • Published Apr 5 • 14
ByteEdit: Boost, Comply and Accelerate Generative Image Editing Paper • 2404.04860 • Published Apr 7 • 24
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion Paper • 2404.04544 • Published Apr 6 • 20
Aligning Diffusion Models by Optimizing Human Utility Paper • 2404.04465 • Published Apr 6 • 12
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators Paper • 2404.05014 • Published Apr 7 • 22
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8 • 56
PointInfinity: Resolution-Invariant Point Diffusion Models Paper • 2404.03566 • Published Apr 4 • 13
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition Paper • 2404.02514 • Published Apr 3 • 9
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models Paper • 2404.01367 • Published Apr 1 • 19
CosmicMan: A Text-to-Image Foundation Model for Humans Paper • 2404.01294 • Published Apr 1 • 15
CameraCtrl: Enabling Camera Control for Text-to-Video Generation Paper • 2404.02101 • Published Apr 2 • 15
LITA: Language Instructed Temporal-Localization Assistant Paper • 2403.19046 • Published Mar 27 • 16
Garment3DGen: 3D Garment Stylization and Texture Generation Paper • 2403.18816 • Published Mar 27 • 18
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion Paper • 2403.18818 • Published Mar 27 • 22
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text Paper • 2403.18421 • Published Mar 27 • 20
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation Paper • 2403.17694 • Published Mar 26 • 10
DBRX Collection DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 87
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models Paper • 2403.17005 • Published Mar 25 • 13
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions Paper • 2403.16627 • Published Mar 25 • 20
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance Paper • 2403.14781 • Published Mar 21 • 14
ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars Paper • 2403.15383 • Published Mar 22 • 11
Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering Paper • 2403.14554 • Published Mar 21 • 12
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation Paper • 2403.14621 • Published Mar 21 • 14
DreamReward: Text-to-3D Generation with Human Preference Paper • 2403.14613 • Published Mar 21 • 33