Hub Python Library documentation
Inference
Inference
Inference is the process of using a trained model to make predictions on new data. As this process can be compute-intensive,
running on a dedicated server can be an interesting option. The huggingface_hub
library provides an easy way to call a
service that runs inference for hosted models. There are several services you can connect to:
- Inference API: a service that allows you to run accelerated inference on Hugging Face’s infrastructure for free. This service is a fast way to get started, test different models, and prototype AI products.
- Inference Endpoints: a product to easily deploy models to production. Inference is run by Hugging Face in a dedicated, fully managed infrastructure on a cloud provider of your choice.
These services can be called with the InferenceClient object. Please refer to this guide for more information on how to use it.
Inference Client
class huggingface_hub.InferenceClient
< source >( model: typing.Optional[str] = None token: typing.Optional[str] = None timeout: typing.Optional[float] = None )
Parameters
-
model (
str
,optional
) — The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g.bigcode/starcoder
or a URL to a deployed Inference Endpoint. Defaults to None, in which case a recommended model is automatically selected for the task. -
token (
str
, optional) — Hugging Face token. Will default to the locally saved token. -
timeout (
float
,optional
) — The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
Initialize a new Inference Client.
InferenceClient aims to provide a unified experience to perform inference. The client can be used seamlessly with either the (free) Inference API or self-hosted Inference Endpoints.
audio_classification
< source >(
audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path]
model: typing.Optional[str] = None
)
→
List[Dict]
Parameters
- audio (Union[str, Path, bytes, BinaryIO]) — The audio content to classify. It can be raw audio bytes, a local audio file, or a URL pointing to an audio file.
-
model (
str
, optional) — The model to use for audio classification. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for audio classification will be used.
Returns
List[Dict]
The classification output containing the predicted label and its confidence.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Perform audio classification on the provided audio content.
automatic_speech_recognition
< source >( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None ) → str
Parameters
- audio (Union[str, Path, bytes, BinaryIO]) — The content to transcribe. It can be raw audio bytes, local audio file, or a URL to an audio file.
-
model (
str
, optional) — The model to use for ASR. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for ASR will be used.
Returns
str
The transcribed text.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Perform automatic speech recognition (ASR or audio-to-text) on the given audio content.
conversational
< source >(
text: str
generated_responses: typing.Optional[typing.List[str]] = None
past_user_inputs: typing.Optional[typing.List[str]] = None
parameters: typing.Union[typing.Dict[str, typing.Any], NoneType] = None
model: typing.Optional[str] = None
)
→
Dict
Parameters
-
text (
str
) — The last input from the user in the conversation. -
generated_responses (
List[str]
, optional) — A list of strings corresponding to the earlier replies from the model. Defaults to None. -
past_user_inputs (
List[str]
, optional) — A list of strings corresponding to the earlier replies from the user. Should be the same length asgenerated_responses
. Defaults to None. -
parameters (
Dict[str, Any]
, optional) — Additional parameters for the conversational task. Defaults to None. For more details about the available parameters, please refer to this page -
model (
str
, optional) — The model to use for the conversational task. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended conversational model will be used. Defaults to None.
Returns
Dict
The generated conversational output.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Generate conversational responses based on the given input text (i.e. chat with the API).
Example:
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> output = client.conversational("Hi, who are you?")
>>> output
{'generated_text': 'I am the one who knocks.', 'conversation': {'generated_responses': ['I am the one who knocks.'], 'past_user_inputs': ['Hi, who are you?']}, 'warnings': ['Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.']}
>>> client.conversational(
... "Wow, that's scary!",
... generated_responses=output["conversation"]["generated_responses"],
... past_user_inputs=output["conversation"]["past_user_inputs"],
... )
feature_extraction
< source >(
text: str
model: typing.Optional[str] = None
)
→
np.ndarray
Parameters
-
text (
str
) — The text to embed. -
model (
str
, optional) — The model to use for the conversational task. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended conversational model will be used. Defaults to None.
Returns
np.ndarray
The embedding representing the input text as a float32 numpy array.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Generate embeddings for a given text.
Example:
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.feature_extraction("Hi, who are you?")
array([[ 2.424802 , 2.93384 , 1.1750331 , ..., 1.240499, -0.13776633, -0.7889173 ],
[-0.42943227, -0.6364878 , -1.693462 , ..., 0.41978157, -2.4336355 , 0.6162071 ],
...,
[ 0.28552425, -0.928395 , -1.2077185 , ..., 0.76810825, -2.1069427 , 0.6236161 ]], dtype=float32)
image_classification
< source >(
image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path]
model: typing.Optional[str] = None
)
→
List[Dict]
Parameters
-
image (
Union[str, Path, bytes, BinaryIO]
) — The image to classify. It can be raw bytes, an image file, or a URL to an online image. -
model (
str
, optional) — The model to use for image classification. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for image classification will be used.
Returns
List[Dict]
a list of dictionaries containing the predicted label and associated probability.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Perform image classification on the given image using the specified model.
image_segmentation
< source >(
image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path]
model: typing.Optional[str] = None
)
→
List[Dict]
Parameters
-
image (
Union[str, Path, bytes, BinaryIO]
) — The image to segment. It can be raw bytes, an image file, or a URL to an online image. -
model (
str
, optional) — The model to use for image segmentation. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for image segmentation will be used.
Returns
List[Dict]
A list of dictionaries containing the segmented masks and associated attributes.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Perform image segmentation on the given image using the specified model.
You must have PIL
installed if you want to work with images (pip install Pillow
).
image_to_image
< source >(
image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path]
prompt: typing.Optional[str] = None
negative_prompt: typing.Optional[str] = None
height: typing.Optional[int] = None
width: typing.Optional[int] = None
num_inference_steps: typing.Optional[int] = None
guidance_scale: typing.Optional[float] = None
model: typing.Optional[str] = None
**kwargs
)
→
Image
Parameters
-
image (
Union[str, Path, bytes, BinaryIO]
) — The input image for translation. It can be raw bytes, an image file, or a URL to an online image. -
prompt (
str
, optional) — The text prompt to guide the image generation. -
negative_prompt (
str
, optional) — A negative prompt to guide the translation process. -
height (
int
, optional) — The height in pixels of the generated image. -
width (
int
, optional) — The width in pixels of the generated image. -
num_inference_steps (
int
, optional) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. -
guidance_scale (
float
, optional) — Higher guidance scale encourages to generate images that are closely linked to the textprompt
, usually at the expense of lower image quality. -
model (
str
, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
Returns
Image
The translated image.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Perform image-to-image translation using a specified model.
You must have PIL
installed if you want to work with images (pip install Pillow
).
image_to_text
< source >(
image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path]
model: typing.Optional[str] = None
)
→
str
Parameters
-
image (
Union[str, Path, bytes, BinaryIO]
) — The input image to caption. It can be raw bytes, an image file, or a URL to an online image.. -
model (
str
, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
Returns
str
The generated text.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Takes an input image and return text.
Models can have very different outputs depending on your use case (image captioning, optical character recognition (OCR), Pix2Struct, etc). Please have a look to the model card to learn more about a model’s specificities.
Example:
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_to_text("cat.jpg")
'a cat standing in a grassy field '
>>> client.image_to_text("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
'a dog laying on the grass next to a flower pot '
post
< source >( json: typing.Union[str, typing.Dict, typing.List, NoneType] = None data: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, NoneType] = None model: typing.Optional[str] = None task: typing.Optional[str] = None ) → Response
Parameters
-
json (
Union[str, Dict, List]
, optional) — The JSON data to send in the request body. Defaults to None. -
data (
Union[str, Path, bytes, BinaryIO]
, optional) — The content to send in the request body. It can be raw bytes, a pointer to an opened file, a local file path, or a URL to an online resource (image, audio file,…). If bothjson
anddata
are passed,data
will take precedence. At leastjson
ordata
must be provided. Defaults to None. -
model (
str
, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. Will override the model defined at the instance level. Defaults to None. -
task (
str
, optional) — The task to perform on the inference. Used only to default to a recommended model ifmodel
is not provided. At leastmodel
ortask
must be provided. Defaults to None.
Returns
Response
The requests
HTTP response.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Make a POST request to the inference server.
sentence_similarity
< source >(
sentence: str
other_sentences: typing.List[str]
model: typing.Optional[str] = None
)
→
List[float]
Parameters
-
sentence (
str
) — The main sentence to compare to others. -
other_sentences (
List[str]
) — The list of sentences to compare to. -
model (
str
, optional) — The model to use for the conversational task. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. If not provided, the default recommended conversational model will be used. Defaults to None.
Returns
List[float]
The embedding representing the input text.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Compute the semantic similarity between a sentence and a list of other sentences by comparing their embeddings.
Example:
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.sentence_similarity(
... "Machine learning is so easy.",
... other_sentences=[
... "Deep learning is so straightforward.",
... "This is so difficult, like rocket science.",
... "I can't believe how much I struggled with this.",
... ],
... )
[0.7785726189613342, 0.45876261591911316, 0.2906220555305481]
summarization
< source >(
text: str
parameters: typing.Union[typing.Dict[str, typing.Any], NoneType] = None
model: typing.Optional[str] = None
)
→
str
Parameters
-
text (
str
) — The input text to summarize. -
parameters (
Dict[str, Any]
, optional) — Additional parameters for summarization. Check out this page for more details. -
model (
str
, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
Returns
str
The generated summary text.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Generate a summary of a given text using a specified model.
text_to_image
< source >(
prompt: str
negative_prompt: typing.Optional[str] = None
height: typing.Optional[float] = None
width: typing.Optional[float] = None
num_inference_steps: typing.Optional[float] = None
guidance_scale: typing.Optional[float] = None
model: typing.Optional[str] = None
**kwargs
)
→
Image
Parameters
-
prompt (
str
) — The prompt to generate an image from. -
negative_prompt (
str
, optional) — An optional negative prompt for the image generation. -
height (
float
, optional) — The height in pixels of the image to generate. -
width (
float
, optional) — The width in pixels of the image to generate. -
num_inference_steps (
int
, optional) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. -
guidance_scale (
float
, optional) — Higher guidance scale encourages to generate images that are closely linked to the textprompt
, usually at the expense of lower image quality. -
model (
str
, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
Returns
Image
The generated image.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Generate an image based on a given text using a specified model.
You must have PIL
installed if you want to work with images (pip install Pillow
).
Example:
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> image = client.text_to_image("An astronaut riding a horse on the moon.")
>>> image.save("astronaut.png")
>>> image = client.text_to_image(
... "An astronaut riding a horse on the moon.",
... negative_prompt="low resolution, blurry",
... model="stabilityai/stable-diffusion-2-1",
... )
>>> image.save("better_astronaut.png")
text_to_speech
< source >(
text: str
model: typing.Optional[str] = None
)
→
bytes
Parameters
-
text (
str
) — The text to synthesize. -
model (
str
, optional) — The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
Returns
bytes
The generated audio.
Raises
InferenceTimeoutError or HTTPError
- InferenceTimeoutError — If the model is unavailable or the request times out.
HTTPError
— If the request fails with an HTTP error status code other than HTTP 503.
Synthesize an audio of a voice pronouncing a given text.
InferenceTimeoutError
Error raised when a model is unavailable or the request times out.
Return types
For most tasks, the return value has a built-in type (string, list, image…). Here is a list for the more complex types.
class huggingface_hub._inference_types.ClassificationOutput
< source >( *args **kwargs )
Dictionary containing the output of a audio_classification() and image_classification() task.
class huggingface_hub._inference_types.ConversationalOutputConversation
< source >( *args **kwargs )
Dictionary containing the “conversation” part of a conversational() task.
class huggingface_hub._inference_types.ConversationalOutput
< source >( *args **kwargs )
Dictionary containing the output of a conversational() task.
class huggingface_hub._inference_types.ImageSegmentationOutput
< source >( *args **kwargs )
Dictionary containing information about a image_segmentation() task. In practice, image segmentation returns a
list of ImageSegmentationOutput
with 1 item per mask.
InferenceAPI
InferenceAPI
is the legacy way to call the Inference API. The interface is more simplistic and requires knowing
the input parameters and output format for each task. It also lacks the ability to connect to other services like
Inference Endpoints or AWS SageMaker. InferenceAPI
will soon be deprecated so we recommend using InferenceClient
whenever possible. Check out this guide to learn how to switch from
InferenceAPI
to InferenceClient in your scripts.
class huggingface_hub.InferenceApi
< source >( repo_id: str task: typing.Optional[str] = None token: typing.Optional[str] = None gpu: bool = False )
Client to configure requests and make calls to the HuggingFace Inference API.
Example:
>>> from huggingface_hub.inference_api import InferenceApi
>>> # Mask-fill example
>>> inference = InferenceApi("bert-base-uncased")
>>> inference(inputs="The goal of life is [MASK].")
[{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]
>>> # Question Answering example
>>> inference = InferenceApi("deepset/roberta-base-squad2")
>>> inputs = {
... "question": "What's my name?",
... "context": "My name is Clara and I live in Berkeley.",
... }
>>> inference(inputs)
{'score': 0.9326569437980652, 'start': 11, 'end': 16, 'answer': 'Clara'}
>>> # Zero-shot example
>>> inference = InferenceApi("typeform/distilbert-base-uncased-mnli")
>>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
>>> params = {"candidate_labels": ["refund", "legal", "faq"]}
>>> inference(inputs, params)
{'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}
>>> # Overriding configured task
>>> inference = InferenceApi("bert-base-uncased", task="feature-extraction")
>>> # Text-to-image
>>> inference = InferenceApi("stabilityai/stable-diffusion-2-1")
>>> inference("cat")
<PIL.PngImagePlugin.PngImageFile image (...)>
>>> # Return as raw response to parse the output yourself
>>> inference = InferenceApi("mio/amadeus")
>>> response = inference("hello world", raw_response=True)
>>> response.headers
{"Content-Type": "audio/flac", ...}
>>> response.content # raw bytes from server
b'(...)'
__init__
< source >( repo_id: str task: typing.Optional[str] = None token: typing.Optional[str] = None gpu: bool = False )
Parameters
-
repo_id (
str
) — Id of repository (e.g. user/bert-base-uncased). -
task (
str
, optional, defaultsNone
) — Whether to force a task instead of using task specified in the repository. - token (str, optional) — The API token to use as HTTP bearer authorization. This is not the authentication token. You can find the token in https://huggingface.co/settings/token. Alternatively, you can find both your organizations and personal API tokens using HfApi().whoami(token).
- gpu (bool, optional, defaults False) — Whether to use GPU instead of CPU for inference(requires Startup plan at least).
Inits headers and API call information.
__call__
< source >( inputs: typing.Union[str, typing.Dict, typing.List[str], typing.List[typing.List[str]], NoneType] = None params: typing.Optional[typing.Dict] = None data: typing.Optional[bytes] = None raw_response: bool = False )
Parameters
-
inputs (
str
orDict
orList[str]
orList[List[str]]
, optional) — Inputs for the prediction. -
params (
Dict
, optional) — Additional parameters for the models. Will be sent asparameters
in the payload. -
data (
bytes
, optional) — Bytes content of the request. In this case, leaveinputs
andparams
empty. -
raw_response (
bool
, defaults toFalse
) — IfTrue
, the rawResponse
object is returned. You can parse its content as preferred. By default, the content is parsed into a more practical format (json dictionary or PIL Image for example).
Make a call to the Inference API.