Transformers documentation

Pipelines for inference

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Pipelines for inference

The pipeline() makes it simple to use any model from the Hub for inference on any language, computer vision, speech, and multimodal tasks. Even if you don’t have experience with a specific modality or aren’t familiar with the underlying code behind the models, you can still use them for inference with the pipeline()! This tutorial will teach you to:

  • Use a pipeline() for inference.
  • Use a specific tokenizer or model.
  • Use a pipeline() for audio, vision, and multimodal tasks.

Take a look at the pipeline() documentation for a complete list of supported tasks and available parameters.

Pipeline usage

While each task has an associated pipeline(), it is simpler to use the general pipeline() abstraction which contains all the task-specific pipelines. The pipeline() automatically loads a default model and a preprocessing class capable of inference for your task.

  1. Start by creating a pipeline() and specify an inference task:
>>> from transformers import pipeline

>>> generator = pipeline(task="text-generation")
  1. Pass your input text to the pipeline():
>>> generator(
...     "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone"
... )  # doctest: +SKIP
[{'generated_text': 'Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Iron-priests at the door to the east, and thirteen for the Lord Kings at the end of the mountain'}]

If you have more than one input, pass your input as a list:

>>> generator(
...     [
...         "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone",
...         "Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne",
...     ]
... )  # doctest: +SKIP

Any additional parameters for your task can also be included in the pipeline(). The text-generation task has a generate() method with several parameters for controlling the output. For example, if you want to generate more than one output, set the num_return_sequences parameter:

>>> generator(
...     "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone",
...     num_return_sequences=2,
... )  # doctest: +SKIP

Choose a model and tokenizer

The pipeline() accepts any model from the Hub. There are tags on the Hub that allow you to filter for a model you’d like to use for your task. Once you’ve picked an appropriate model, load it with the corresponding AutoModelFor and AutoTokenizer class. For example, load the AutoModelForCausalLM class for a causal language modeling task:

>>> from transformers import AutoTokenizer, AutoModelForCausalLM

>>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
>>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")

Create a pipeline() for your task, and specify the model and tokenizer you’ve loaded:

>>> from transformers import pipeline

>>> generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)

Pass your input text to the pipeline() to generate some text:

>>> generator(
...     "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone"
... )  # doctest: +SKIP
[{'generated_text': 'Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Dragon-lords (for them to rule in a world ruled by their rulers, and all who live within the realm'}]

Audio pipeline

The pipeline() also supports audio tasks like audio classification and automatic speech recognition.

For example, let’s classify the emotion in this audio clip:

>>> from datasets import load_dataset
>>> import torch

>>> torch.manual_seed(42)
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> audio_file = ds[0]["audio"]["path"]

Find an audio classification model on the Model Hub for emotion recognition and load it in the pipeline():

>>> from transformers import pipeline

>>> audio_classifier = pipeline(
...     task="audio-classification", model="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
... )

Pass the audio file to the pipeline():

>>> preds = audio_classifier(audio_file)
>>> preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
>>> preds
[{'score': 0.1315, 'label': 'calm'}, {'score': 0.1307, 'label': 'neutral'}, {'score': 0.1274, 'label': 'sad'}, {'score': 0.1261, 'label': 'fearful'}, {'score': 0.1242, 'label': 'happy'}]

Vision pipeline

Using a pipeline() for vision tasks is practically identical.

Specify your task and pass your image to the classifier. The image can be a link or a local path to the image. For example, what species of cat is shown below?

pipeline-cat-chonk

>>> from transformers import pipeline

>>> vision_classifier = pipeline(task="image-classification")
>>> preds = vision_classifier(
...     images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
... )
>>> preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
>>> preds
[{'score': 0.4335, 'label': 'lynx, catamount'}, {'score': 0.0348, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'}, {'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'}, {'score': 0.0239, 'label': 'Egyptian cat'}, {'score': 0.0229, 'label': 'tiger cat'}]

Multimodal pipeline

The pipeline() supports more than one modality. For example, a visual question answering (VQA) task combines text and image. Feel free to use any image link you like and a question you want to ask about the image. The image can be a URL or a local path to the image.

For example, if you use the same image from the vision pipeline above:

>>> image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
>>> question = "Where is the cat?"

Create a pipeline for vqa and pass it the image and question:

>>> from transformers import pipeline

>>> vqa = pipeline(task="vqa")
>>> preds = vqa(image=image, question=question)
>>> preds = [{"score": round(pred["score"], 4), "answer": pred["answer"]} for pred in preds]
>>> preds
[{'score': 0.911, 'answer': 'snow'}, {'score': 0.8786, 'answer': 'in snow'}, {'score': 0.6714, 'answer': 'outside'}, {'score': 0.0293, 'answer': 'on ground'}, {'score': 0.0272, 'answer': 'ground'}]