Pipelines¶

The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the task summary for examples of use.

There are two categories of pipeline abstractions to be aware about:

The pipeline abstraction¶

The pipeline abstraction is a wrapper around all the other available pipelines. It is instantiated as any other pipeline but requires an additional argument which is the task.

transformers.pipeline(task: str, model: Optional = None, config: Optional[Union[str, transformers.configuration_utils.PretrainedConfig]] = None, tokenizer: Optional[Union[str, transformers.tokenization_utils.PreTrainedTokenizer]] = None, framework: Optional[str] = None, revision: Optional[str] = None, use_fast: bool = False, **kwargs) → transformers.pipelines.Pipeline[source]¶

Utility factory method to build a Pipeline.

Pipelines are made of:

  • A tokenizer in charge of mapping raw textual input to token.

  • A model to make predictions from the inputs.

  • Some (optional) post processing for enhancing model’s output.

Parameters
  • task (str) –

    The task defining which pipeline will be returned. Currently accepted tasks are:

  • model (str or PreTrainedModel or TFPreTrainedModel, optional) –

    The model that will be used by the pipeline to make predictions. This can be a model identifier or an actual instance of a pretrained model inheriting from PreTrainedModel (for PyTorch) or TFPreTrainedModel (for TensorFlow).

    If not provided, the default for the task will be loaded.

  • config (str or PretrainedConfig, optional) –

    The configuration that will be used by the pipeline to instantiate the model. This can be a model identifier or an actual pretrained model configuration inheriting from PretrainedConfig.

    If not provided, the default configuration file for the requested model will be used. That means that if model is given, its default configuration will be used. However, if model is not supplied, this task’s default model’s config is used instead.

  • tokenizer (str or PreTrainedTokenizer, optional) –

    The tokenizer that will be used by the pipeline to encode data for the model. This can be a model identifier or an actual pretrained tokenizer inheriting from PreTrainedTokenizer.

    If not provided, the default tokenizer for the given model will be loaded (if it is a string). If model is not specified or not a string, then the default tokenizer for config is loaded (if it is a string). However, if config is also not given or not a string, then the default tokenizer for the given task will be loaded.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • revision (str, optional, defaults to "main") – When passing a task name or a string model identifier: The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

  • use_fast (bool, optional, defaults to False) – Whether or not to use a Fast tokenizer if possible (a PreTrainedTokenizerFast).

  • kwargs – Additional keyword arguments passed along to the specific pipeline init (see the documentation for the corresponding pipeline class for possible values).

Returns

A suitable pipeline for the task.

Return type

Pipeline

Examples:

>>> from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer

>>> # Sentiment analysis pipeline
>>> pipeline('sentiment-analysis')

>>> # Question answering pipeline, specifying the checkpoint identifier
>>> pipeline('question-answering', model='distilbert-base-cased-distilled-squad', tokenizer='bert-base-cased')

>>> # Named entity recognition pipeline, passing in a specific model and tokenizer
>>> model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
>>> pipeline('ner', model=model, tokenizer=tokenizer)

The task specific pipelines¶

ConversationalPipeline¶

class transformers.Conversation(text: str = None, conversation_id: uuid.UUID = None)[source]¶

Utility class containing a conversation and its history. This class is meant to be used as an input to the ConversationalPipeline. The conversation contains a number of utility function to manage the addition of new user input and generated model responses. A conversation needs to contain an unprocessed user input before being passed to the ConversationalPipeline. This user input is either created when the class is instantiated, or by calling conversational_pipeline.append_response("input") after a conversation turn.

Parameters
  • text (str, optional) – The initial user input to start the conversation. If not provided, a user input needs to be provided manually using the add_user_input() method before the conversation can begin.

  • conversation_id (uuid.UUID, optional) – Unique identifier for the conversation. If not provided, a random UUID4 id will be assigned to the conversation.

Usage:

conversation = Conversation("Going to the movies tonight - any suggestions?")

# Steps usually performed by the model when generating a response:
# 1. Mark the user input as processed (moved to the history)
conversation.mark_processed()
# 2. Append a mode response
conversation.append_response("The Big lebowski.")

conversation.add_user_input("Is it good?")
class transformers.ConversationalPipeline(min_length_for_response=32, *args, **kwargs)[source]¶

Multi-turn conversational pipeline.

This conversational pipeline can currently be loaded from pipeline() using the following task identifier: "conversational".

The models that this pipeline can use are models that have been fine-tuned on a multi-turn conversational task, currently: ‘microsoft/DialoGPT-small’, ‘microsoft/DialoGPT-medium’, ‘microsoft/DialoGPT-large’. See the up-to-date list of available models on huggingface.co/models.

Usage:

conversational_pipeline = pipeline("conversational")

conversation_1 = Conversation("Going to the movies tonight - any suggestions?")
conversation_2 = Conversation("What's the last book you have read?")

conversational_pipeline([conversation_1, conversation_2])

conversation_1.add_user_input("Is it an action movie?")
conversation_2.add_user_input("What is the genre of this book?")

conversational_pipeline([conversation_1, conversation_2])
Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

  • binary_output (bool, optional, defaults to False) – Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

  • min_length_for_response (int, optional, defaults to 32) – The minimum length (in number of tokens) for a response.

__call__(conversations: Union[transformers.pipelines.Conversation, List[transformers.pipelines.Conversation]], clean_up_tokenization_spaces=True, **generate_kwargs)[source]¶

Generate responses for the conversation(s) given as inputs.

Parameters
  • conversations (a Conversation or a list of Conversation) – Conversations to generate responses for.

  • clean_up_tokenization_spaces (bool, optional, defaults to False) – Whether or not to clean up the potential extra spaces in the text output.

  • generate_kwargs – Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework here).

Returns

Conversation(s) with updated generated responses for those containing a new user input.

Return type

Conversation or a list of Conversation

FeatureExtractionPipeline¶

class transformers.FeatureExtractionPipeline(model: Union[PreTrainedModel, TFPreTrainedModel], tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, modelcard: Optional[transformers.modelcard.ModelCard] = None, framework: Optional[str] = None, args_parser: transformers.pipelines.ArgumentHandler = None, device: int = - 1, task: str = '')[source]¶

Feature extraction pipeline using no model head. This pipeline extracts the hidden states from the base transformer, which can be used as features in downstream tasks.

This feature extraction pipeline can currently be loaded from pipeline() using the task identifier: "feature-extraction".

All models may be used for this pipeline. See a list of all models, including community-contributed models on huggingface.co/models.

Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

__call__(*args, **kwargs)[source]¶

Extract the features of the input(s).

Parameters

args (str or List[str]) – One or several texts (or one list of texts) to get the features of.

Returns

The features computed by the model.

Return type

A nested list of float

FillMaskPipeline¶

class transformers.FillMaskPipeline(model: Union[PreTrainedModel, TFPreTrainedModel], tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, modelcard: Optional[transformers.modelcard.ModelCard] = None, framework: Optional[str] = None, args_parser: transformers.pipelines.ArgumentHandler = None, device: int = - 1, top_k=5, task: str = '', **kwargs)[source]¶

Masked language modeling prediction pipeline using any ModelWithLMHead. See the masked language modeling examples for more information.

This mask filling pipeline can currently be loaded from pipeline() using the following task identifier: "fill-mask".

The models that this pipeline can use are models that have been trained with a masked language modeling objective, which includes the bi-directional models in the library. See the up-to-date list of available models on huggingface.co/models.

Note

This pipeline only works for inputs with exactly one token masked.

Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

  • binary_output (bool, optional, defaults to False) – Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

  • top_k (int, defaults to 5) – The number of predictions to return.

__call__(*args, targets=None, top_k: Optional[int] = None, **kwargs)[source]¶

Fill the masked token in the text(s) given as inputs.

Parameters
  • args (str or List[str]) – One or several texts (or one list of prompts) with masked tokens.

  • targets (str or List[str], optional) – When passed, the model will return the scores for the passed token or tokens rather than the top k predictions in the entire vocabulary. If the provided targets are not in the model vocab, they will be tokenized and the first resulting token will be used (with a warning).

  • top_k (int, optional) – When passed, overrides the number of predictions to return.

Returns

Each result comes as list of dictionaries with the following keys:

  • sequence (str) – The corresponding input with the mask token prediction.

  • score (float) – The corresponding probability.

  • token (int) – The predicted token id (to replace the masked one).

  • token (str) – The predicted token (to replace the masked one).

Return type

A list or a list of list of dict

NerPipeline¶

This class is an alias of the TokenClassificationPipeline defined below. Please refer to that pipeline for documentation and usage examples.

QuestionAnsweringPipeline¶

class transformers.QuestionAnsweringPipeline(model: Union[PreTrainedModel, TFPreTrainedModel], tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, modelcard: Optional[transformers.modelcard.ModelCard] = None, framework: Optional[str] = None, device: int = - 1, task: str = '', **kwargs)[source]¶

Question Answering pipeline using any ModelForQuestionAnswering. See the question answering examples for more information.

This question answering pipeline can currently be loaded from pipeline() using the following task identifier: "question-answering".

The models that this pipeline can use are models that have been fine-tuned on a question answering task. See the up-to-date list of available models on huggingface.co/models.

Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

  • binary_output (bool, optional, defaults to False) – Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

__call__(*args, **kwargs)[source]¶

Answer the question(s) given as inputs by using the context(s).

Parameters
  • args (SquadExample or a list of SquadExample) – One or several SquadExample containing the question and context.

  • X (SquadExample or a list of SquadExample, optional) – One or several SquadExample containing the question and context (will be treated the same way as if passed as the first positional argument).

  • data (SquadExample or a list of SquadExample, optional) – One or several SquadExample containing the question and context (will be treated the same way as if passed as the first positional argument).

  • question (str or List[str]) – One or several question(s) (must be used in conjunction with the context argument).

  • context (str or List[str]) – One or several context(s) associated with the question(s) (must be used in conjunction with the question argument).

  • topk (int, optional, defaults to 1) – The number of answers to return (will be chosen by order of likelihood).

  • doc_stride (int, optional, defaults to 128) – If the context is too long to fit with the question for the model, it will be split in several chunks with some overlap. This argument controls the size of that overlap.

  • max_answer_len (int, optional, defaults to 15) – The maximum length of predicted answers (e.g., only answers with a shorter length are considered).

  • max_seq_len (int, optional, defaults to 384) – The maximum length of the total sentence (context + question) after tokenization. The context will be split in several chunks (using doc_stride) if needed.

  • max_question_len (int, optional, defaults to 64) – The maximum length of the question after tokenization. It will be truncated if needed.

  • handle_impossible_answer (bool, optional, defaults to False) – Whether or not we accept impossible as an answer.

Returns

Each result comes as a dictionary with the following keys:

  • score (float) – The probability associated to the answer.

  • start (int) – The start index of the answer (in the tokenized version of the input).

  • end (int) – The end index of the answer (in the tokenized version of the input).

  • answer (str) – The answer to the question.

Return type

A dict or a list of dict

static create_sample(question: Union[str, List[str]], context: Union[str, List[str]]) → Union[transformers.data.processors.squad.SquadExample, List[transformers.data.processors.squad.SquadExample]][source]¶

QuestionAnsweringPipeline leverages the SquadExample internally. This helper method encapsulate all the logic for converting question(s) and context(s) to SquadExample.

We currently support extractive question answering.

Parameters
  • question (str or List[str]) – The question(s) asked.

  • context (str or List[str]) – The context(s) in which we will look for the answer.

Returns

The corresponding SquadExample grouping question and context.

Return type

One or a list of SquadExample

decode(start: numpy.ndarray, end: numpy.ndarray, topk: int, max_answer_len: int) → Tuple[source]¶

Take the output of any ModelForQuestionAnswering and will generate probabilities for each span to be the actual answer.

In addition, it filters out some unwanted/impossible cases like answer len being greater than max_answer_len or answer end position being before the starting position. The method supports output the k-best answer through the topk argument.

Parameters
  • start (np.ndarray) – Individual start probabilities for each token.

  • end (np.ndarray) – Individual end probabilities for each token.

  • topk (int) – Indicates how many possible answer span(s) to extract from the model output.

  • max_answer_len (int) – Maximum size of the answer to extract from the model’s output.

span_to_answer(text: str, start: int, end: int) → Dict[str, Union[str, int]][source]¶

When decoding from token probabilities, this method maps token indexes to actual word in the initial context.

Parameters
  • text (str) – The actual context to extract the answer from.

  • start (int) – The answer starting token index.

  • end (int) – The answer end token index.

Returns

Dictionary like {'answer': str, 'start': int, 'end': int}

SummarizationPipeline¶

class transformers.SummarizationPipeline(*args, **kwargs)[source]¶

Summarize news articles and other documents.

This summarizing pipeline can currently be loaded from pipeline() using the following task identifier: "summarization".

The models that this pipeline can use are models that have been fine-tuned on a summarization task, which is currently, ‘bart-large-cnn’, ‘t5-small’, ‘t5-base’, ‘t5-large’, ‘t5-3b’, ‘t5-11b’. See the up-to-date list of available models on huggingface.co/models.

Usage:

# use bart in pytorch
summarizer = pipeline("summarization")
summarizer("Sam Shleifer writes the best docstring examples in the whole world.", min_length=5, max_length=20)

# use t5 in tf
summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf")
summarizer("Sam Shleifer writes the best docstring examples in the whole world.", min_length=5, max_length=20)
Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

  • binary_output (bool, optional, defaults to False) – Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

__call__(*documents, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs)[source]¶

Summarize the text(s) given as inputs.

Parameters
  • documents (str or List[str]) – One or several articles (or one list of articles) to summarize.

  • return_text (bool, optional, defaults to True) – Whether or not to include the decoded texts in the outputs

  • return_tensors (bool, optional, defaults to False) – Whether or not to include the tensors of predictions (as token indices) in the outputs.

  • clean_up_tokenization_spaces (bool, optional, defaults to False) – Whether or not to clean up the potential extra spaces in the text output.

  • generate_kwargs – Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework here).

Returns

Each result comes as a dictionary with the following keys:

  • summary_text (str, present when return_text=True) – The summary of the corresponding input.

  • summary_token_ids (torch.Tensor or tf.Tensor, present when return_tensors=True) – The token ids of the summary.

Return type

A list or a list of list of dict

TextClassificationPipeline¶

class transformers.TextClassificationPipeline(return_all_scores: bool = False, **kwargs)[source]¶

Text classification pipeline using any ModelForSequenceClassification. See the sequence classification examples for more information.

This text classification pipeline can currently be loaded from pipeline() using the following task identifier: "sentiment-analysis" (for classifying sequences according to positive or negative sentiments).

If multiple classification labels are available (model.config.num_labels >= 2), the pipeline will run a softmax over the results. If there is a single label, the pipeline will run a sigmoid over the result.

The models that this pipeline can use are models that have been fine-tuned on a sequence classification task. See the up-to-date list of available models on huggingface.co/models.

Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

  • binary_output (bool, optional, defaults to False) – Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

  • return_all_scores (bool, optional, defaults to False) – Whether to return all prediction scores or just the one of the predicted class.

__call__(*args, **kwargs)[source]¶

Classify the text(s) given as inputs.

Parameters

args (str or List[str]) – One or several texts (or one list of prompts) to classify.

Returns

Each result comes as list of dictionaries with the following keys:

  • label (str) – The label predicted.

  • score (float) – The corresponding probability.

If self.return_all_scores=True, one such dictionary is returned per label.

Return type

A list or a list of list of dict

TextGenerationPipeline¶

class transformers.TextGenerationPipeline(*args, **kwargs)[source]¶

Language generation pipeline using any ModelWithLMHead. This pipeline predicts the words that will follow a specified text prompt.

This language generation pipeline can currently be loaded from pipeline() using the following task identifier: "text-generation".

The models that this pipeline can use are models that have been trained with an autoregressive language modeling objective, which includes the uni-directional models in the library (e.g. gpt2). See the list of available community models on huggingface.co/models.

Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

  • binary_output (bool, optional, defaults to False) – Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

__call__(text_inputs, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, prefix=None, **generate_kwargs)[source]¶

Complete the prompt(s) given as inputs.

Parameters
  • args (str or List[str]) – One or several prompts (or one list of prompts) to complete.

  • return_tensors (bool, optional, defaults to False) – Whether or not to include the tensors of predictions (as token indices) in the outputs.

  • return_text (bool, optional, defaults to True) – Whether or not to include the decoded texts in the outputs.

  • clean_up_tokenization_spaces (bool, optional, defaults to False) – Whether or not to clean up the potential extra spaces in the text output.

  • prefix (str, optional) – Prefix added to prompt.

  • generate_kwargs – Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework here).

Returns

Each result comes as a dictionary with the following keys:

  • generated_text (str, present when return_text=True) – The generated text.

  • generated_token_ids (torch.Tensor or tf.Tensor, present when return_tensors=True) – The token ids of the generated text.

Return type

A list or a list of list of dict

Text2TextGenerationPipeline¶

class transformers.Text2TextGenerationPipeline(*args, **kwargs)[source]¶

Pipeline for text to text generation using seq2seq models.

This Text2TextGenerationPipeline pipeline can currently be loaded from pipeline() using the following task identifier: "text2text-generation".

The models that this pipeline can use are models that have been fine-tuned on a translation task. See the up-to-date list of available models on huggingface.co/models.

Usage:

text2text_generator = pipeline("text2text-generation")
text2text_generator("question: What is 42 ? context: 42 is the answer to life, the universe and everything")
Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

  • binary_output (bool, optional, defaults to False) – Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

__call__(*args, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs)[source]¶

Generate the output text(s) using text(s) given as inputs.

Parameters
  • args (str or List[str]) – Input text for the encoder.

  • return_tensors (bool, optional, defaults to False) – Whether or not to include the tensors of predictions (as token indices) in the outputs.

  • return_text (bool, optional, defaults to True) – Whether or not to include the decoded texts in the outputs.

  • clean_up_tokenization_spaces (bool, optional, defaults to False) – Whether or not to clean up the potential extra spaces in the text output.

  • generate_kwargs – Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework here).

Returns

Each result comes as a dictionary with the following keys:

  • generated_text (str, present when return_text=True) – The generated text.

  • generated_token_ids (torch.Tensor or tf.Tensor, present when return_tensors=True) – The token ids of the generated text.

Return type

A list or a list of list of dict

TokenClassificationPipeline¶

class transformers.TokenClassificationPipeline(model: Union[PreTrainedModel, TFPreTrainedModel], tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, modelcard: Optional[transformers.modelcard.ModelCard] = None, framework: Optional[str] = None, args_parser: transformers.pipelines.ArgumentHandler = <transformers.pipelines.TokenClassificationArgumentHandler object>, device: int = -1, binary_output: bool = False, ignore_labels=['O'], task: str = '', grouped_entities: bool = False, ignore_subwords: bool = False)[source]¶

Named Entity Recognition pipeline using any ModelForTokenClassification. See the named entity recognition examples for more information.

This token recognition pipeline can currently be loaded from pipeline() using the following task identifier: "ner" (for predicting the classes of tokens in a sequence: person, organisation, location or miscellaneous).

The models that this pipeline can use are models that have been fine-tuned on a token classification task. See the up-to-date list of available models on huggingface.co/models.

Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

  • binary_output (bool, optional, defaults to False) – Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

  • ignore_labels (List[str], defaults to ["O"]) – A list of labels to ignore.

  • grouped_entities (bool, optional, defaults to False) – Whether or not to group the tokens corresponding to the same entity together in the predictions or not.

__call__(inputs: Union[str, List[str]], **kwargs)[source]¶

Classify each token of the text(s) given as inputs.

Parameters

inputs (str or List[str]) – One or several texts (or one list of texts) for token classification.

Returns

Each result comes as a list of dictionaries (one for each token in the corresponding input, or each entity if this pipeline was instantiated with grouped_entities=True) with the following keys:

  • word (str) – The token/word classified.

  • score (float) – The corresponding probability for entity.

  • entity (str) – The entity predicted for that token/word.

  • index (int, only present when self.grouped_entities=False) – The index of the corresponding token in the sentence.

Return type

A list or a list of list of dict

group_entities(entities: List[dict]) → List[dict][source]¶

Find and group together the adjacent tokens with the same entity predicted.

Parameters

entities (dict) – The entities predicted by the pipeline.

group_sub_entities(entities: List[dict]) → dict[source]¶

Group together the adjacent tokens with the same entity predicted.

Parameters

entities (dict) – The entities predicted by the pipeline.

ZeroShotClassificationPipeline¶

class transformers.ZeroShotClassificationPipeline(args_parser=<transformers.pipelines.ZeroShotClassificationArgumentHandler object>, *args, **kwargs)[source]¶

NLI-based zero-shot classification pipeline using a ModelForSequenceClassification trained on NLI (natural language inference) tasks.

Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model. Then, the logit for entailment is taken as the logit for the candidate label being valid. Any NLI model can be used, but the id of the entailment label must be included in the model config’s label2id.

This NLI pipeline can currently be loaded from pipeline() using the following task identifier: "zero-shot-classification".

The models that this pipeline can use are models that have been fine-tuned on an NLI task. See the up-to-date list of available models on huggingface.co/models.

Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

  • binary_output (bool, optional, defaults to False) – Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

__call__(sequences: Union[str, List[str]], candidate_labels, hypothesis_template='This example is {}.', multi_class=False)[source]¶

Classify the sequence(s) given as inputs. See the ZeroShotClassificationPipeline documentation for more information.

Parameters
  • sequences (str or List[str]) – The sequence(s) to classify, will be truncated if the model input is too large.

  • candidate_labels (str or List[str]) – The set of possible class labels to classify each sequence into. Can be a single label, a string of comma-separated labels, or a list of labels.

  • hypothesis_template (str, optional, defaults to "This example is {}.") – The template used to turn each label into an NLI-style hypothesis. This template must include a {} or similar syntax for the candidate label to be inserted into the template. For example, the default template is "This example is {}." With the candidate label "sports", this would be fed into the model like "<cls> sequence to classify <sep> This example is sports . <sep>". The default template works well in many cases, but it may be worthwhile to experiment with different templates depending on the task setting.

  • multi_class (bool, optional, defaults to False) – Whether or not multiple candidate labels can be true. If False, the scores are normalized such that the sum of the label likelihoods for each sequence is 1. If True, the labels are considered independent and probabilities are normalized for each candidate by doing a softmax of the entailment score vs. the contradiction score.

Returns

Each result comes as a dictionary with the following keys:

  • sequence (str) – The sequence for which this is the output.

  • labels (List[str]) – The labels sorted by order of likelihood.

  • scores (List[float]) – The probabilities for each of the labels.

Return type

A dict or a list of dict

Parent class: Pipeline¶

class transformers.Pipeline(model: Union[PreTrainedModel, TFPreTrainedModel], tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, modelcard: Optional[transformers.modelcard.ModelCard] = None, framework: Optional[str] = None, task: str = '', args_parser: transformers.pipelines.ArgumentHandler = None, device: int = - 1, binary_output: bool = False)[source]¶

The Pipeline class is the class from which all pipelines inherit. Refer to this class for methods shared across different pipelines.

Base class implementing pipelined operations. Pipeline workflow is defined as a sequence of the following operations:

Input -> Tokenization -> Model Inference -> Post-Processing (task dependent) -> Output

Pipeline supports running on CPU or GPU through the device argument (see below).

Some pipeline, like for instance FeatureExtractionPipeline ('feature-extraction' ) output large tensor object as nested-lists. In order to avoid dumping such large structure as textual data we provide the binary_output constructor argument. If set to True, the output will be stored in the pickle format.

Parameters
  • model (PreTrainedModel or TFPreTrainedModel) – The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from PreTrainedModel for PyTorch and TFPreTrainedModel for TensorFlow.

  • tokenizer (PreTrainedTokenizer) – The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from PreTrainedTokenizer.

  • modelcard (str or ModelCard, optional) – Model card attributed to the model for this pipeline.

  • framework (str, optional) –

    The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be installed.

    If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the model, or to PyTorch if no model is provided.

  • task (str, defaults to "") – A task-identifier for the pipeline.

  • args_parser (ArgumentHandler, optional) – Reference to the object in charge of parsing supplied pipeline parameters.

  • device (int, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.

  • binary_output (bool, optional, defaults to False) – Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

check_model_type(supported_models: Union[List[str], dict])[source]¶

Check if the model class is in supported by the pipeline.

Parameters

supported_models (List[str] or dict) – The list of models supported by the pipeline, or a dictionary with model class values.

device_placement()[source]¶

Context Manager allowing tensor allocation on the user-specified device in framework agnostic way.

Returns

Context manager

Examples:

# Explicitly ask for tensor allocation on CUDA device :0
pipe = pipeline(..., device=0)
with pipe.device_placement():
    # Every framework specific tensor allocation will be done on the request device
    output = pipe(...)
ensure_tensor_on_device(**inputs)[source]¶

Ensure PyTorch tensors are on the specified device.

Parameters

inputs (keyword arguments that should be torch.Tensor) – The tensors to place on self.device.

Returns

The same as inputs but on the proper device.

Return type

Dict[str, torch.Tensor]

predict(X)[source]¶

Scikit / Keras interface to transformers’ pipelines. This method will forward to __call__().

save_pretrained(save_directory: str)[source]¶

Save the pipeline’s model and tokenizer.

Parameters

save_directory (str) – A path to the directory where to saved. It will be created if it doesn’t exist.

transform(X)[source]¶

Scikit / Keras interface to transformers’ pipelines. This method will forward to __call__().