Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published 16 days ago • 13
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Paper • 2502.17422 • Published 13 days ago • 7
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? Paper • 2502.17535 • Published 13 days ago • 8
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 17 days ago • 160
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published 7 days ago • 51