MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published about 1 month ago • 219
τ^2-Bench: Evaluating Conversational Agents in a Dual-Control Environment Paper • 2506.07982 • Published Jun 9, 2025 • 7
$τ^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment Paper • 2506.07982 • Published Jun 9, 2025 • 7 • 2
Identifying the Risks of LM Agents with an LM-Emulated Sandbox Paper • 2309.15817 • Published Sep 25, 2023