Fix tool calling model in unit2/tool_calling_agents
Summary
Fixes ToolCallingAgent failing in tool_calling_agents.ipynb with a 400 Bad Request from the Hugging Face Inference API.
smolagents 1.25.0 changed the default InferenceClientModel to Qwen/Qwen3-Next-80B-A3B-Thinking. That model does not support tool calling on the router endpoint, so the first agent step fails when tools are sent. The notebook now explicitly uses Qwen/Qwen2.5-Coder-32B-Instruct, which matches the other Unit 2 smolagents notebooks and works with ToolCallingAgent.
Changes
unit2/smolagents/tool_calling_agents.ipynb: passmodel_id="Qwen/Qwen2.5-Coder-32B-Instruct"toInferenceClientModelinstead of relying on the library default.
Test plan
- Run the HF login cell in
tool_calling_agents.ipynb - Run the
ToolCallingAgentexample cell; confirm it completes withoutAgentGenerationError/ 400 - Confirm the agent banner shows
InferenceClientModel - Qwen/Qwen2.5-Coder-32B-Instruct - Confirm a tool call (e.g.
web_search) appears in the trace output
Good catch on the tool calling fix. Looking at the agents-course/notebooks repo, the tool calling agent examples in unit2 are particularly sensitive to model selection because not all models expose a consistent function-calling interface — some handle the JSON schema for tool definitions differently, and others silently fall back to text generation without actually invoking the tool, which is a subtle failure mode that's hard to catch without careful output inspection.
The core issue with tool calling agents is that the model needs to reliably emit structured outputs that conform to the tool schema, and smaller or older checkpoints in the hub often have inconsistent behavior here. If the fix involves swapping to a model with better function-calling support (something like a recent Mistral or Qwen variant with explicit tool-use fine-tuning), it's worth documenting in the notebook which capability specifically broke and why the replacement handles it correctly. That context is genuinely useful for learners trying to understand the boundary between model capability and agent scaffolding.
On a related note, in multi-agent setups where tool-calling agents are chained or delegated to, this kind of silent tool failure becomes a trust and verification problem as much as a model capability problem. In work we've done on AgentGraph, tracking which agent invoked which tool and whether the tool call was well-formed vs. silently degraded is critical for debugging orchestration issues — similar in spirit to what the SteelSpine replay approach is trying to solve for agent debugging more broadly. For the course notebooks it's probably out of scope, but worth flagging that the fix here is load-bearing if these examples get used as templates for more complex pipelines.