Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 11 days ago • 76
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation Paper • 2502.08690 • Published 17 days ago • 39
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published 16 days ago • 182
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Paper • 2502.06533 • Published 19 days ago • 18
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published 17 days ago • 45
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 16 days ago • 142
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning Paper • 2502.03275 • Published 24 days ago • 14
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 24 days ago • 197
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 29 days ago • 56
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published about 1 month ago • 55
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 335
Do generative video models learn physical principles from watching videos? Paper • 2501.09038 • Published Jan 14 • 32
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 274
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 78
EXAONE-3.5 Collection EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B. • 10 items • Updated Dec 10, 2024 • 98
Llama 3.3 (All Versions) Collection Meta's new Llama 3.3 (70B) model in all formats. Includes GGUF, 4-bit bnb and original versions. • 3 items • Updated 2 days ago • 36
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 129