Running 505 505 Scaling test-time compute π Enhance math problem solving by scaling test-time compute
Running 541 541 Vision Arena (Testing VLMs side-by-side) πΌ Analyze images to detect and label objects