MotionLLM: Understanding Human Behaviors from Human Motions and Videos Paper • 2405.20340 • Published 1 day ago • 2
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting Paper • 2405.18424 • Published 3 days ago • 6
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published 8 days ago • 14
PBADet: A One-Stage Anchor-Free Approach for Part-Body Association Paper • 2402.07814 • Published Feb 12 • 1
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 65
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19 • 26
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation Paper • 2404.13026 • Published Apr 19 • 21
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8 • 28
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published Apr 8 • 23
InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds Paper • 2403.20309 • Published Mar 29 • 16
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling Paper • 2403.19655 • Published Mar 28 • 14
Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation Paper • 2403.19319 • Published Mar 28 • 6
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion Paper • 2403.18818 • Published Mar 27 • 22
Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians Paper • 2403.17898 • Published Mar 26 • 12
2D Gaussian Splatting for Geometrically Accurate Radiance Fields Paper • 2403.17888 • Published Mar 26 • 25
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model Paper • 2403.13064 • Published Mar 19 • 30
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS Paper • 2403.13806 • Published Mar 20 • 18
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers Paper • 2403.12943 • Published Mar 19 • 13
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion Paper • 2403.12008 • Published Mar 18 • 18
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Paper • 2403.12015 • Published Mar 18 • 60
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14 • 11
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 121
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting Paper • 2403.08551 • Published Mar 13 • 8
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Paper • 2403.08764 • Published Mar 13 • 34
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM Paper • 2403.07487 • Published Mar 12 • 12
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11 • 22
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Paper • 2402.19427 • Published Feb 29 • 50
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27 • 182
Seamless Human Motion Composition with Blended Positional Encodings Paper • 2402.15509 • Published Feb 23 • 12
MVD^2: Efficient Multiview 3D Reconstruction for Multiview Diffusion Paper • 2402.14253 • Published Feb 22 • 5
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information Paper • 2402.13616 • Published Feb 21 • 44
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Paper • 2402.12226 • Published Feb 19 • 37
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling Paper • 2402.10211 • Published Feb 15 • 8
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering Paper • 2402.10128 • Published Feb 15 • 14
Lumos : Empowering Multimodal LLMs with Scene Text Recognition Paper • 2402.08017 • Published Feb 12 • 22
Graph Mamba: Towards Learning on Graphs with State Space Models Paper • 2402.08678 • Published Feb 13 • 12
Vision-Based Hand Gesture Customization from a Single Demonstration Paper • 2402.08420 • Published Feb 13 • 7
IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation Paper • 2402.08682 • Published Feb 13 • 12
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 33
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs Paper • 2402.07872 • Published Feb 12 • 14
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss Paper • 2402.05008 • Published Feb 7 • 19
Boximator: Generating Rich and Controllable Motions for Video Synthesis Paper • 2402.01566 • Published Feb 2 • 25