LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models Paper • 2405.18377 • Published 13 days ago • 14
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published 13 days ago • 18
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published 12 days ago • 41
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Paper • 2405.20335 • Published 11 days ago • 16
MotionLLM: Understanding Human Behaviors from Human Motions and Videos Paper • 2405.20340 • Published 11 days ago • 16
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper • 2405.19893 • Published 11 days ago • 22
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published 11 days ago • 26
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published 10 days ago • 57
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner Paper • 2405.14979 • Published 18 days ago • 14
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published 18 days ago • 30
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published 14 days ago • 49
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models Paper • 2405.17428 • Published 14 days ago • 13
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published 25 days ago • 25
INDUS: Effective and Efficient Language Models for Scientific Applications Paper • 2405.10725 • Published 24 days ago • 23
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published 22 days ago • 53
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published 17 days ago • 42
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published 17 days ago • 49
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published 20 days ago • 25
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published 21 days ago • 25
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published 19 days ago • 21
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published 18 days ago • 27
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels Paper • 2405.07526 • Published 28 days ago • 15
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts Paper • 2405.07518 • Published 28 days ago • 21
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models Paper • 2405.08317 • Published 27 days ago • 8
SpeechVerse: A Large-scale Generalizable Audio Language Model Paper • 2405.08295 • Published 27 days ago • 10
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding Paper • 2405.08344 • Published 27 days ago • 10
Understanding the performance gap between online and offline alignment algorithms Paper • 2405.08448 • Published 27 days ago • 11
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Paper • 2405.08748 • Published 27 days ago • 17
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning Paper • 2405.08054 • Published 28 days ago • 21
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published 27 days ago • 26
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation Paper • 2405.09546 • Published 26 days ago • 9
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published 26 days ago • 23
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction Paper • 2405.10315 • Published 25 days ago • 9
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion Paper • 2405.09874 • Published 25 days ago • 15
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published 25 days ago • 24
CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published 25 days ago • 38
Chameleon: Mixed-Modal Early-Fusion Foundation Models Paper • 2405.09818 • Published 25 days ago • 101
Layer-Condensed KV Cache for Efficient Inference of Large Language Models Paper • 2405.10637 • Published 24 days ago • 17
Towards Modular LLMs by Building and Reusing a Library of LoRAs Paper • 2405.11157 • Published 23 days ago • 23
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published 21 days ago • 23
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published 22 days ago • 33
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published 21 days ago • 42
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations Paper • 2404.17521 • Published Apr 26 • 12
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 66
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting Paper • 2404.19702 • Published Apr 30 • 17
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 64
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 68