smolagents documentation
Using different models
Using different models
smolagents
provides a flexible framework that allows you to use various language models from different providers.
This guide will show you how to use different model types with your agents.
Available model types
smolagents
supports several model types out of the box:
- InferenceClientModel: Uses Hugging Face’s Inference API to access models
- TransformersModel: Runs models locally using the Transformers library
- VLLMModel: Uses vLLM for fast inference with optimized serving
- MLXModel: Optimized for Apple Silicon devices using MLX
- LiteLLMModel: Provides access to hundreds of LLMs through LiteLLM
- LiteLLMRouterModel: Distributes requests among multiple models
- OpenAIServerModel: Connects to OpenAI’s API
- AzureOpenAIServerModel: Uses Azure’s OpenAI service
- AmazonBedrockServerModel: Connects to AWS Bedrock’s API
Using Google Gemini Models
As explained in the Google Gemini API documentation (https://ai.google.dev/gemini-api/docs/openai), Google provides an OpenAI-compatible API for Gemini models, allowing you to use the OpenAIServerModel with Gemini models by setting the appropriate base URL.
First, install the required dependencies:
pip install smolagents[openai]
Then, get a Gemini API key and set it in your code:
GEMINI_API_KEY = <YOUR-GEMINI-API-KEY>
Now, you can initialize the Gemini model using the OpenAIServerModel
class
and setting the api_base
parameter to the Gemini API base URL:
from smolagents import OpenAIServerModel
model = OpenAIServerModel(
model_id="gemini-2.0-flash",
api_key=GEMINI_API_KEY,
# Google Gemini OpenAI-compatible API base URL
api_base="https://generativelanguage.googleapis.com/v1beta/openai/",
)