SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization Paper • 2411.11909 • Published 5 days ago • 18
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published 5 days ago • 38
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 6 days ago • 89
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation Paper • 2411.08033 • Published 9 days ago • 21
Number it: Temporal Grounding Videos like Flipping Manga Paper • 2411.10332 • Published 7 days ago • 12
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published 7 days ago • 26
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published 15 days ago • 30
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization Paper • 2411.06208 • Published 13 days ago • 18
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models Paper • 2411.05830 • Published 16 days ago • 20
LLaMo: Large Language Model-based Molecular Graph Assistant Paper • 2411.00871 • Published 22 days ago • 21
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level Paper • 2411.03562 • Published 16 days ago • 60
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model Paper • 2411.04496 • Published 15 days ago • 22
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? Paper • 2411.05000 • Published 14 days ago • 21
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published 15 days ago • 20
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper • 2411.04952 • Published 14 days ago • 27
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Paper • 2411.04928 • Published 14 days ago • 47
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 14 days ago • 48