The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published 15 days ago • 181
Controllable Human Image Generation with Personalized Multi-Garments Paper • 2411.16801 • Published Nov 25, 2024 • 4
Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI) Paper • 2411.16754 • Published Nov 24, 2024 • 3
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation Paper • 2411.17383 • Published Nov 26, 2024 • 7
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality Paper • 2411.15241 • Published Nov 22, 2024 • 6
MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts Paper • 2411.14721 • Published Nov 22, 2024 • 5
Learning 3D Representations from Procedural 3D Programs Paper • 2411.17467 • Published Nov 25, 2024 • 9
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis Paper • 2411.16173 • Published Nov 25, 2024 • 8
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity Paper • 2411.15411 • Published Nov 23, 2024 • 8
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens Paper • 2411.17691 • Published Nov 26, 2024 • 13
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE Paper • 2411.16856 • Published Nov 25, 2024 • 13
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models Paper • 2411.17451 • Published Nov 26, 2024 • 11
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 20
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Paper • 2411.15296 • Published Nov 22, 2024 • 20
SketchAgent: Language-Driven Sequential Sketch Generation Paper • 2411.17673 • Published Nov 26, 2024 • 19
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 22
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation Paper • 2411.17945 • Published Nov 26, 2024 • 24
Pathways on the Image Manifold: Image Editing via Video Generation Paper • 2411.16819 • Published Nov 25, 2024 • 33