OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models Paper • 2604.10866 • Published Apr 13 • 67
Runtime error Agents Featured 110 Qwen2 VL Localization 📉 110 Detect objects in images using text prompts