CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published 3 days ago • 24
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published 3 days ago • 22
LLM-AD: Large Language Model based Audio Description System Paper • 2405.00983 • Published 17 days ago • 13
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published 17 days ago • 44
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Paper • 2404.18212 • Published 21 days ago • 19
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published 24 days ago • 48
Interactive3D: Create What You Want by Interactive 3D Generation Paper • 2404.16510 • Published 24 days ago • 17
MotionMaster: Training-free Camera Motion Transfer For Video Generation Paper • 2404.15789 • Published 25 days ago • 10
SnapKV: LLM Knows What You are Looking for Before Generation Paper • 2404.14469 • Published 27 days ago • 23
Learn2Talk: 3D Talking Face Learns from 2D Talking Face Paper • 2404.12888 • Published 30 days ago • 2
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 27 days ago • 230
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published 30 days ago • 27
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation Paper • 2404.13026 • Published 30 days ago • 21
Does Gaussian Splatting need SFM Initialization? Paper • 2404.12547 • Published about 1 month ago • 8
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation Paper • 2404.11565 • Published Apr 17 • 12
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published about 1 month ago • 23
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding Paper • 2404.11912 • Published Apr 18 • 16
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 61
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 27
Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior Paper • 2404.06780 • Published Apr 10 • 9
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published Apr 10 • 22
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 45
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 57
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 21
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image Paper • 2404.02152 • Published Apr 2 • 3
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition Paper • 2404.02514 • Published Apr 3 • 9
Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes Paper • 2404.01543 • Published Apr 2 • 3
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 22
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Paper • 2404.01258 • Published Apr 1 • 10
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes Paper • 2404.00987 • Published Apr 1 • 20
InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds Paper • 2403.20309 • Published Mar 29 • 16
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection Paper • 2403.19888 • Published Mar 29 • 9
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs Paper • 2403.20041 • Published Mar 29 • 33
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Paper • 2403.19651 • Published Mar 28 • 20
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling Paper • 2403.19655 • Published Mar 28 • 14
Garment3DGen: 3D Garment Stylization and Texture Generation Paper • 2403.18816 • Published Mar 27 • 18
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction Paper • 2403.18795 • Published Mar 27 • 16
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 37
2D Gaussian Splatting for Geometrically Accurate Radiance Fields Paper • 2403.17888 • Published Mar 26 • 25
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation Paper • 2403.17001 • Published Mar 25 • 6