Hub Python Library documentation


You are viewing v0.15.1 version. A newer version v0.23.0.rc1 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started


Inference is the process of using a trained model to make predictions on new data. As this process can be compute-intensive, running on a dedicated server can be an interesting option. The huggingface_hub library provides an easy way to call a service that runs inference for hosted models. There are several services you can connect to:

  • Inference API: a service that allows you to run accelerated inference on Hugging Face’s infrastructure for free. This service is a fast way to get started, test different models, and prototype AI products.
  • Inference Endpoints: a product to easily deploy models to production. Inference is run by Hugging Face in a dedicated, fully managed infrastructure on a cloud provider of your choice.

These services can be called with the InferenceClient object. Please refer to this guide for more information on how to use it.

Inference Client

class huggingface_hub.InferenceClient

< >

( model: typing.Optional[str] = None token: typing.Optional[str] = None timeout: typing.Optional[float] = None )


  • model (str, optional) — The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. bigcode/starcoder or a URL to a deployed Inference Endpoint. Defaults to None, in which case a recommended model is automatically selected for the task.
  • token (str, optional) — Hugging Face token. Will default to the locally saved token.
  • timeout (float, optional) — The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.

Initialize a new Inference Client.

InferenceClient aims to provide a unified experience to perform inference. The client can be used seamlessly with either the (free) Inference API or self-hosted Inference Endpoints.


< >

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None ) List[Dict]


  • audio (Union[str, Path, bytes, BinaryIO]) — The audio content to classify. It can be raw audio bytes, a local audio file, or a URL pointing to an audio file.
  • model (str, optional) — The model to use for audio classification. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for audio classification will be used.



The classification output containing the predicted label and its confidence.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Perform audio classification on the provided audio content.


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.audio_classification("audio.wav")
[{'score': 0.4976358711719513, 'label': 'hap'}, {'score': 0.3677836060523987, 'label': 'neu'},...]


< >

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None ) str


  • audio (Union[str, Path, bytes, BinaryIO]) — The content to transcribe. It can be raw audio bytes, local audio file, or a URL to an audio file.
  • model (str, optional) — The model to use for ASR. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for ASR will be used.



The transcribed text.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Perform automatic speech recognition (ASR or audio-to-text) on the given audio content.


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.automatic_speech_recognition("hello_world.wav")
"hello world"


< >

( text: str generated_responses: typing.Optional[typing.List[str]] = None past_user_inputs: typing.Optional[typing.List[str]] = None parameters: typing.Union[typing.Dict[str, typing.Any], NoneType] = None model: typing.Optional[str] = None ) Dict


  • text (str) — The last input from the user in the conversation.
  • generated_responses (List[str], optional) — A list of strings corresponding to the earlier replies from the model. Defaults to None.
  • past_user_inputs (List[str], optional) — A list of strings corresponding to the earlier replies from the user. Should be the same length as generated_responses. Defaults to None.
  • parameters (Dict[str, Any], optional) — Additional parameters for the conversational task. Defaults to None. For more details about the available parameters, please refer to this page
  • model (str, optional) — The model to use for the conversational task. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended conversational model will be used. Defaults to None.



The generated conversational output.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Generate conversational responses based on the given input text (i.e. chat with the API).


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> output = client.conversational("Hi, who are you?")
>>> output
{'generated_text': 'I am the one who knocks.', 'conversation': {'generated_responses': ['I am the one who knocks.'], 'past_user_inputs': ['Hi, who are you?']}, 'warnings': ['Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.']}
>>> client.conversational(
...     "Wow, that's scary!",
...     generated_responses=output["conversation"]["generated_responses"],
...     past_user_inputs=output["conversation"]["past_user_inputs"],
... )


< >

( text: str model: typing.Optional[str] = None ) np.ndarray


  • text (str) — The text to embed.
  • model (str, optional) — The model to use for the conversational task. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended conversational model will be used. Defaults to None.



The embedding representing the input text as a float32 numpy array.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Generate embeddings for a given text.


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.feature_extraction("Hi, who are you?")
array([[ 2.424802  ,  2.93384   ,  1.1750331 , ...,  1.240499, -0.13776633, -0.7889173 ],
[-0.42943227, -0.6364878 , -1.693462  , ...,  0.41978157, -2.4336355 ,  0.6162071 ],
[ 0.28552425, -0.928395  , -1.2077185 , ...,  0.76810825, -2.1069427 ,  0.6236161 ]], dtype=float32)


< >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None ) List[Dict]


  • image (Union[str, Path, bytes, BinaryIO]) — The image to classify. It can be raw bytes, an image file, or a URL to an online image.
  • model (str, optional) — The model to use for image classification. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for image classification will be used.



a list of dictionaries containing the predicted label and associated probability.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Perform image classification on the given image using the specified model.


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_classification("")
[{'score': 0.9779096841812134, 'label': 'Blenheim spaniel'}, ...]


< >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None ) List[Dict]


  • image (Union[str, Path, bytes, BinaryIO]) — The image to segment. It can be raw bytes, an image file, or a URL to an online image.
  • model (str, optional) — The model to use for image segmentation. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for image segmentation will be used.



A list of dictionaries containing the segmented masks and associated attributes.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Perform image segmentation on the given image using the specified model.

You must have PIL installed if you want to work with images (pip install Pillow).


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_segmentation("cat.jpg"):
[{'score': 0.989008, 'label': 'LABEL_184', 'mask': <PIL.PngImagePlugin.PngImageFile image mode=L size=400x300 at 0x7FDD2B129CC0>}, ...]


< >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] prompt: typing.Optional[str] = None negative_prompt: typing.Optional[str] = None height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: typing.Optional[int] = None guidance_scale: typing.Optional[float] = None model: typing.Optional[str] = None **kwargs ) Image


  • image (Union[str, Path, bytes, BinaryIO]) — The input image for translation. It can be raw bytes, an image file, or a URL to an online image.
  • prompt (str, optional) — The text prompt to guide the image generation.
  • negative_prompt (str, optional) — A negative prompt to guide the translation process.
  • height (int, optional) — The height in pixels of the generated image.
  • width (int, optional) — The width in pixels of the generated image.
  • num_inference_steps (int, optional) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
  • guidance_scale (float, optional) — Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.
  • model (str, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.



The translated image.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Perform image-to-image translation using a specified model.

You must have PIL installed if you want to work with images (pip install Pillow).


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> image = client.image_to_image("cat.jpg", prompt="turn the cat into a tiger")


< >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None ) str


  • image (Union[str, Path, bytes, BinaryIO]) — The input image to caption. It can be raw bytes, an image file, or a URL to an online image..
  • model (str, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.



The generated text.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Takes an input image and return text.

Models can have very different outputs depending on your use case (image captioning, optical character recognition (OCR), Pix2Struct, etc). Please have a look to the model card to learn more about a model’s specificities.


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_to_text("cat.jpg")
'a cat standing in a grassy field '
>>> client.image_to_text("")
'a dog laying on the grass next to a flower pot '


< >

( json: typing.Union[str, typing.Dict, typing.List, NoneType] = None data: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, NoneType] = None model: typing.Optional[str] = None task: typing.Optional[str] = None ) Response


  • json (Union[str, Dict, List], optional) — The JSON data to send in the request body. Defaults to None.
  • data (Union[str, Path, bytes, BinaryIO], optional) — The content to send in the request body. It can be raw bytes, a pointer to an opened file, a local file path, or a URL to an online resource (image, audio file,…). If both json and data are passed, data will take precedence. At least json or data must be provided. Defaults to None.
  • model (str, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. Will override the model defined at the instance level. Defaults to None.
  • task (str, optional) — The task to perform on the inference. Used only to default to a recommended model if model is not provided. At least model or task must be provided. Defaults to None.



The requests HTTP response.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Make a POST request to the inference server.


< >

( sentence: str other_sentences: typing.List[str] model: typing.Optional[str] = None ) List[float]


  • sentence (str) — The main sentence to compare to others.
  • other_sentences (List[str]) — The list of sentences to compare to.
  • model (str, optional) — The model to use for the conversational task. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended conversational model will be used. Defaults to None.



The embedding representing the input text.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Compute the semantic similarity between a sentence and a list of other sentences by comparing their embeddings.


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.sentence_similarity(
...     "Machine learning is so easy.",
...     other_sentences=[
...         "Deep learning is so straightforward.",
...         "This is so difficult, like rocket science.",
...         "I can't believe how much I struggled with this.",
...     ],
... )
[0.7785726189613342, 0.45876261591911316, 0.2906220555305481]


< >

( text: str parameters: typing.Union[typing.Dict[str, typing.Any], NoneType] = None model: typing.Optional[str] = None ) str


  • text (str) — The input text to summarize.
  • parameters (Dict[str, Any], optional) — Additional parameters for summarization. Check out this page for more details.
  • model (str, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.



The generated summary text.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Generate a summary of a given text using a specified model.


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.summarization("The Eiffel tower...")
'The Eiffel tower is one of the most famous landmarks in the world....'


< >

( prompt: str negative_prompt: typing.Optional[str] = None height: typing.Optional[float] = None width: typing.Optional[float] = None num_inference_steps: typing.Optional[float] = None guidance_scale: typing.Optional[float] = None model: typing.Optional[str] = None **kwargs ) Image


  • prompt (str) — The prompt to generate an image from.
  • negative_prompt (str, optional) — An optional negative prompt for the image generation.
  • height (float, optional) — The height in pixels of the image to generate.
  • width (float, optional) — The width in pixels of the image to generate.
  • num_inference_steps (int, optional) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
  • guidance_scale (float, optional) — Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.
  • model (str, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.



The generated image.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Generate an image based on a given text using a specified model.

You must have PIL installed if you want to work with images (pip install Pillow).


>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

>>> image = client.text_to_image("An astronaut riding a horse on the moon.")

>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     negative_prompt="low resolution, blurry",
...     model="stabilityai/stable-diffusion-2-1",
... )


< >

( text: str model: typing.Optional[str] = None ) bytes


  • text (str) — The text to synthesize.
  • model (str, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.



The generated audio.


InferenceTimeoutError or HTTPError

  • InferenceTimeoutError — If the model is unavailable or the request times out.
  • HTTPError — If the request fails with an HTTP error status code other than HTTP 503.

Synthesize an audio of a voice pronouncing a given text.


>>> from pathlib import Path
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

>>> audio = client.text_to_speech("Hello world")
>>> Path("hello_world.wav").write_bytes(audio)


class huggingface_hub.InferenceTimeoutError

< >

( *args **kwargs )

Error raised when a model is unavailable or the request times out.

Return types

For most tasks, the return value has a built-in type (string, list, image…). Here is a list for the more complex types.

class huggingface_hub._inference_types.ClassificationOutput

< >

( *args **kwargs )


  • label (str) — The label predicted by the model.
  • score (float) — The score of the label predicted by the model.

Dictionary containing the output of a audio_classification() and image_classification() task.

class huggingface_hub._inference_types.ConversationalOutputConversation

< >

( *args **kwargs )


  • generated_responses (List[str]) — A list of the responses from the model.
  • past_user_inputs (List[str]) — A list of the inputs from the user. Must be the same length as generated_responses.

Dictionary containing the “conversation” part of a conversational() task.

class huggingface_hub._inference_types.ConversationalOutput

< >

( *args **kwargs )


  • generated_text (str) — The last response from the model.
  • conversation (ConversationalOutputConversation) — The past conversation.
  • warnings (List[str]) — A list of warnings associated with the process.

Dictionary containing the output of a conversational() task.

class huggingface_hub._inference_types.ImageSegmentationOutput

< >

( *args **kwargs )


  • label (str) — The label corresponding to the mask.
  • mask (Image) — An Image object representing the mask predicted by the model.
  • score (float) — The score associated with the label for this mask.

Dictionary containing information about a image_segmentation() task. In practice, image segmentation returns a list of ImageSegmentationOutput with 1 item per mask.


InferenceAPI is the legacy way to call the Inference API. The interface is more simplistic and requires knowing the input parameters and output format for each task. It also lacks the ability to connect to other services like Inference Endpoints or AWS SageMaker. InferenceAPI will soon be deprecated so we recommend using InferenceClient whenever possible. Check out this guide to learn how to switch from InferenceAPI to InferenceClient in your scripts.

class huggingface_hub.InferenceApi

< >

( repo_id: str task: typing.Optional[str] = None token: typing.Optional[str] = None gpu: bool = False )

Client to configure requests and make calls to the HuggingFace Inference API.


>>> from huggingface_hub.inference_api import InferenceApi

>>> # Mask-fill example
>>> inference = InferenceApi("bert-base-uncased")
>>> inference(inputs="The goal of life is [MASK].")
[{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]

>>> # Question Answering example
>>> inference = InferenceApi("deepset/roberta-base-squad2")
>>> inputs = {
...     "question": "What's my name?",
...     "context": "My name is Clara and I live in Berkeley.",
... }
>>> inference(inputs)
{'score': 0.9326569437980652, 'start': 11, 'end': 16, 'answer': 'Clara'}

>>> # Zero-shot example
>>> inference = InferenceApi("typeform/distilbert-base-uncased-mnli")
>>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
>>> params = {"candidate_labels": ["refund", "legal", "faq"]}
>>> inference(inputs, params)
{'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}

>>> # Overriding configured task
>>> inference = InferenceApi("bert-base-uncased", task="feature-extraction")

>>> # Text-to-image
>>> inference = InferenceApi("stabilityai/stable-diffusion-2-1")
>>> inference("cat")
<PIL.PngImagePlugin.PngImageFile image (...)>

>>> # Return as raw response to parse the output yourself
>>> inference = InferenceApi("mio/amadeus")
>>> response = inference("hello world", raw_response=True)
>>> response.headers
{"Content-Type": "audio/flac", ...}
>>> response.content # raw bytes from server


< >

( repo_id: str task: typing.Optional[str] = None token: typing.Optional[str] = None gpu: bool = False )


  • repo_id (str) — Id of repository (e.g. user/bert-base-uncased).
  • task (str, optional, defaults None) — Whether to force a task instead of using task specified in the repository.
  • token (str, optional) — The API token to use as HTTP bearer authorization. This is not the authentication token. You can find the token in Alternatively, you can find both your organizations and personal API tokens using HfApi().whoami(token).
  • gpu (bool, optional, defaults False) — Whether to use GPU instead of CPU for inference(requires Startup plan at least).

Inits headers and API call information.


< >

( inputs: typing.Union[str, typing.Dict, typing.List[str], typing.List[typing.List[str]], NoneType] = None params: typing.Optional[typing.Dict] = None data: typing.Optional[bytes] = None raw_response: bool = False )


  • inputs (str or Dict or List[str] or List[List[str]], optional) — Inputs for the prediction.
  • params (Dict, optional) — Additional parameters for the models. Will be sent as parameters in the payload.
  • data (bytes, optional) — Bytes content of the request. In this case, leave inputs and params empty.
  • raw_response (bool, defaults to False) — If True, the raw Response object is returned. You can parse its content as preferred. By default, the content is parsed into a more practical format (json dictionary or PIL Image for example).

Make a call to the Inference API.