Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published Feb 26 • 20
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published Feb 26 • 20 • 2
Running 2.5k 2.5k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 138
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 Paper • 2502.03544 • Published Feb 5 • 44 • 5
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published Feb 6 • 34
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published Feb 6 • 34
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published Feb 6 • 34 • 2
Representation Engineering: A Top-Down Approach to AI Transparency Paper • 2310.01405 • Published Oct 2, 2023 • 5
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Paper • 2403.03218 • Published Mar 5, 2024 • 1