-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 46 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 15 -
Neural Network Diffusion
Paper • 2402.13144 • Published • 95 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper • 2402.13251 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2403.01779
-
AppAgent: Multimodal Agents as Smartphone Users
Paper • 2312.13771 • Published • 53 -
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
Paper • 2401.01173 • Published • 12 -
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Paper • 2401.00246 • Published • 13 -
Image Sculpting: Precise Object Editing with 3D Geometry Control
Paper • 2401.01702 • Published • 20
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper • 2306.07967 • Published • 24 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper • 2306.07954 • Published • 112 -
TryOnDiffusion: A Tale of Two UNets
Paper • 2306.08276 • Published • 72 -
Seeing the World through Your Eyes
Paper • 2306.09348 • Published • 33
-
OmnimatteRF: Robust Omnimatte with 3D Background Modeling
Paper • 2309.07749 • Published • 7 -
AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 26 -
Generative Image Dynamics
Paper • 2309.07906 • Published • 53 -
MagiCapture: High-Resolution Multi-Concept Portrait Customization
Paper • 2309.06895 • Published • 27