Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published 9 days ago • 30
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published 9 days ago • 27
Chameleon: Mixed-Modal Early-Fusion Foundation Models Paper • 2405.09818 • Published 16 days ago • 95
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published 16 days ago • 25
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published 18 days ago • 25
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies Paper • 2404.08197 • Published Apr 12 • 26
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 93
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published Apr 9 • 32
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 58