Skip a Layer or Loop It? Learning Program-of-Layers in LLMs Paper • 2606.06574 • Published 13 days ago • 18
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26, 2025 • 28
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14, 2024 • 51