Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published Jun 26 • 47
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published Jun 17 • 33
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22 • 45
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models Paper • 2406.14035 • Published Jun 20 • 12