The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published 30 days ago • 184
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction Paper • 2405.10315 • Published May 16, 2024 • 14
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published May 16, 2024 • 29
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published May 16, 2024 • 31
Chameleon: Mixed-Modal Early-Fusion Foundation Models Paper • 2405.09818 • Published May 16, 2024 • 131
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models Paper • 2405.09062 • Published May 15, 2024 • 14
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published May 15, 2024 • 28
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Paper • 2405.09215 • Published May 15, 2024 • 22
Compositional Text-to-Image Generation with Dense Blob Representations Paper • 2405.08246 • Published May 14, 2024 • 17
Understanding the performance gap between online and offline alignment algorithms Paper • 2405.08448 • Published May 14, 2024 • 19
SpeechVerse: A Large-scale Generalizable Audio Language Model Paper • 2405.08295 • Published May 14, 2024 • 19
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding Paper • 2405.08344 • Published May 14, 2024 • 15
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Paper • 2405.08748 • Published May 14, 2024 • 24
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning Paper • 2405.08054 • Published May 13, 2024 • 25
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14, 2024 • 31
LLM-AD: Large Language Model based Audio Description System Paper • 2405.00983 • Published May 2, 2024 • 22
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2, 2024 • 55
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment Paper • 2405.01481 • Published May 2, 2024 • 30
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29, 2024 • 120