For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.
So here are 13 new methods + 3 comprehensive studies on test-time scaling:
3. Z1: Efficient Test-time Scaling with Code (2504.00810) Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window
✨3B with MIT license ✨Long context windows up to 128K ✨Strong multimodal reasoning (36.8% on MathVision, on par with 10x larger models) and agent skills (34.5% on ScreenSpot-Pro)
✨ 1/2/8/9/14/38/28B with MIT license ✨ Stronger perception & reasoning vs InternVL 2.5 ✨ Native Multimodal Pre-Training for even better language performance