Utilities for pipelines

This page lists all the utility functions the library provides for pipelines.

Most of those are only useful if you are studying the code of the models in the library.

Argument handling

class transformers.pipelines.ArgumentHandler[source]

Base interface for handling arguments for each Pipeline.

class transformers.pipelines.ZeroShotClassificationArgumentHandler[source]

Handles arguments for zero-shot for text classification by turning each possible label into an NLI premise/hypothesis pair.

class transformers.pipelines.QuestionAnsweringArgumentHandler[source]

QuestionAnsweringPipeline requires the user to provide multiple arguments (i.e. question & context) to be mapped to internal SquadExample.

QuestionAnsweringArgumentHandler manages all the possible to create a SquadExample from the command-line supplied arguments.

Data format

class transformers.pipelines.PipelineDataFormat(output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite: bool = False)[source]

Base class for all the pipeline supported data format both for reading and writing. Supported data formats currently includes: - JSON - CSV - stdin/stdout (pipe)

PipelineDataFormat also includes some utilities to work with multi-columns like mapping from datasets columns to pipelines keyword arguments through the dataset_kwarg_1=dataset_column_1 format.

Parameters
  • output_path (str, optional) – Where to save the outgoing data.

  • input_path (str, optional) – Where to look for the input data.

  • column (str, optional) – The column to read.

  • overwrite (bool, optional, defaults to False) – Whether or not to overwrite the output_path.

static from_str(format: str, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False)transformers.pipelines.PipelineDataFormat[source]

Creates an instance of the right subclass of PipelineDataFormat depending on format.

Parameters
  • format – (str): The format of the desired pipeline. Acceptable values are "json", "csv" or "pipe".

  • output_path (str, optional) – Where to save the outgoing data.

  • input_path (str, optional) – Where to look for the input data.

  • column (str, optional) – The column to read.

  • overwrite (bool, optional, defaults to False) – Whether or not to overwrite the output_path.

Returns

The proper data format.

Return type

PipelineDataFormat

abstract save(data: Union[dict, List[dict]])[source]

Save the provided data object with the representation for the current PipelineDataFormat.

Parameters

data (dict or list of dict) – The data to store.

save_binary(data: Union[dict, List[dict]]) → str[source]

Save the provided data object as a pickle-formatted binary data on the disk.

Parameters

data (dict or list of dict) – The data to store.

Returns

Path where the data has been saved.

Return type

str

class transformers.pipelines.CsvPipelineDataFormat(output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False)[source]

Support for pipelines using CSV data format.

Parameters
  • output_path (str, optional) – Where to save the outgoing data.

  • input_path (str, optional) – Where to look for the input data.

  • column (str, optional) – The column to read.

  • overwrite (bool, optional, defaults to False) – Whether or not to overwrite the output_path.

save(data: List[dict])[source]

Save the provided data object with the representation for the current PipelineDataFormat.

Parameters

data (List[dict]) – The data to store.

class transformers.pipelines.JsonPipelineDataFormat(output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False)[source]

Support for pipelines using JSON file format.

Parameters
  • output_path (str, optional) – Where to save the outgoing data.

  • input_path (str, optional) – Where to look for the input data.

  • column (str, optional) – The column to read.

  • overwrite (bool, optional, defaults to False) – Whether or not to overwrite the output_path.

save(data: dict)[source]

Save the provided data object in a json file.

Parameters

data (dict) – The data to store.

class transformers.pipelines.PipedPipelineDataFormat(output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite: bool = False)[source]

Read data from piped input to the python process. For multi columns data, columns should separated by

If columns are provided, then the output will be a dictionary with {column_x: value_x}

Parameters
  • output_path (str, optional) – Where to save the outgoing data.

  • input_path (str, optional) – Where to look for the input data.

  • column (str, optional) – The column to read.

  • overwrite (bool, optional, defaults to False) – Whether or not to overwrite the output_path.

save(data: dict)[source]

Print the data.

Parameters

data (dict) – The data to store.

save_binary(data: Union[dict, List[dict]]) → str[source]

Save the provided data object as a pickle-formatted binary data on the disk.

Parameters

data (dict or list of dict) – The data to store.

Returns

Path where the data has been saved.

Return type

str

Utilities

transformers.pipelines.get_framework(model=None)[source]

Select framework (TensorFlow or PyTorch) to use.

Parameters

model (str, PreTrainedModel or TFPreTrainedModel, optional) – If both frameworks are installed, picks the one corresponding to the model passed (either a model class or the model name). If no specific model is provided, defaults to using PyTorch.

class transformers.pipelines.PipelineException(task: str, model: str, reason: str)[source]

Raised by a Pipeline when handling __call__.

Parameters
  • task (str) – The task of the pipeline.

  • model (str) – The model used by the pipeline.

  • reason (str) – The error message to display.