Gemstones: A Model Suite for Multi-Faceted Scaling Laws Paper • 2502.06857 • Published 15 days ago • 23
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 15 days ago • 113
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27, 2024 • 52