arxiv:2501.02506
Zhiheng Xi
WooooDyy
AI & ML interests
None yet
Recent Activity
authored
a paper
4 days ago
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models
in Multi-Hop Tool Use
updated
a dataset
about 2 months ago
MathCritique/MathCritique-76k
Organizations
Papers
17
models
None public yet
datasets
None public yet