Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 15 days ago • 62
Frac-Connections: Fractional Extension of Hyper-Connections Paper • 2503.14125 • Published 9 days ago • 19 • 4
Frac-Connections: Fractional Extension of Hyper-Connections Paper • 2503.14125 • Published 9 days ago • 19
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 21 days ago • 18 • 8
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 21 days ago • 18
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 21 days ago • 18 • 8
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published 25 days ago • 34