Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published 20 days ago • 19 • 2
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 Paper • 2502.03544 • Published Feb 5 • 43 • 5
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published Feb 6 • 31 • 2