FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published 6 days ago • 13
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers Paper • 2412.12571 • Published 6 days ago • 8
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 5 days ago • 21
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Paper • 2412.14123 • Published 5 days ago • 11
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Paper • 2412.12953 • Published 6 days ago • 11
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published 5 days ago • 18
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 5 days ago • 43
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 5 days ago • 97
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 7 days ago • 40
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 10 days ago • 72
Smaller Language Models Are Better Instruction Evolvers Paper • 2412.11231 • Published 8 days ago • 24
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published 13 days ago • 35
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published 7 days ago • 33
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Paper • 2412.06673 • Published 14 days ago • 11
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models Paper • 2412.03548 • Published 19 days ago • 16
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer Paper • 2412.07720 • Published 13 days ago • 30
Mogo: RQ Hierarchical Causal Transformer for High-Quality 3D Human Motion Generation Paper • 2412.07797 • Published 18 days ago • 11