ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use Paper โข 2501.02506 โข Published 5 days ago โข 9 โข 3