Image Processor

画像プロセッサは、ビジョン モデルの入力特徴の準備とその出力の後処理を担当します。これには、サイズ変更、正規化、PyTorch、TensorFlow、Flax、Numpy テンソルへの変換などの変換が含まれます。ロジットをセグメンテーション マスクに変換するなど、モデル固有の後処理も含まれる場合があります。


class transformers.ImageProcessingMixin

< >

( **kwargs )

This is an image processor mixin used to provide saving/loading functionality for sequential and image feature extractors.


< >

( pretrained_model_name_or_path: Union cache_dir: Union = None force_download: bool = False local_files_only: bool = False token: Union = None revision: str = 'main' **kwargs )


  • pretrained_model_name_or_path (str or os.PathLike) — This can be either:

    • a string, the model id of a pretrained image_processor hosted inside a model repo on
    • a path to a directory containing a image processor file saved using the save_pretrained() method, e.g., ./my_model_directory/.
    • a path or url to a saved image processor JSON file, e.g., ./my_model_directory/preprocessor_config.json.
  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model image processor should be cached if the standard cache should not be used.
  • force_download (bool, optional, defaults to False) — Whether or not to force to (re-)download the image processor files and override the cached versions if they exist.
  • resume_download (bool, optional, defaults to False) — Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.
  • proxies (Dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': '', 'http://hostname': ''}. The proxies are used on each request.
  • token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, or not specified, will use the token generated when running huggingface-cli login (stored in ~/.huggingface).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on, so revision can be any identifier allowed by git.

Instantiate a type of ImageProcessingMixin from an image processor.


# We can't instantiate directly the base class *ImageProcessingMixin* so let's show the examples on a
# derived class: *CLIPImageProcessor*
image_processor = CLIPImageProcessor.from_pretrained(
)  # Download image_processing_config from and cache.
image_processor = CLIPImageProcessor.from_pretrained(
)  # E.g. image processor (or model) was saved using *save_pretrained('./test/saved_model/')*
image_processor = CLIPImageProcessor.from_pretrained("./test/saved_model/preprocessor_config.json")
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False
assert image_processor.do_normalize is False
image_processor, unused_kwargs = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False, return_unused_kwargs=True
assert image_processor.do_normalize is False
assert unused_kwargs == {"foo": False}


< >

( save_directory: Union push_to_hub: bool = False **kwargs )


  • save_directory (str or os.PathLike) — Directory where the image processor JSON file will be saved (will be created if it does not exist).
  • push_to_hub (bool, optional, defaults to False) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to with repo_id (will default to the name of save_directory in your namespace).
  • kwargs (Dict[str, Any], optional) — Additional key word arguments passed along to the push_to_hub() method.

Save an image processor object to the directory save_directory, so that it can be re-loaded using the from_pretrained() class method.


class transformers.BatchFeature

< >

( data: Optional = None tensor_type: Union = None )


  • data (dict, optional) — Dictionary of lists/arrays/tensors returned by the call/pad methods (‘input_values’, ‘attention_mask’, etc.).
  • tensor_type (Union[None, str, TensorType], optional) — You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at initialization.

Holds the output of the pad() and feature extractor specific __call__ methods.

This class is derived from a python dictionary and can be used as a dictionary.


< >

( tensor_type: Union = None )


  • tensor_type (str or TensorType, optional) — The type of tensors to use. If str, should be one of the values of the enum TensorType. If None, no modification is done.

Convert the inner content to tensors.


< >

( *args **kwargs ) BatchFeature


  • args (Tuple) — Will be passed to the to(...) function of the tensors.
  • kwargs (Dict, optional) — Will be passed to the to(...) function of the tensors.



The same instance after modification.

Send all values to device by calling*args, **kwargs) (PyTorch only). This should support casting in different dtypes and sending the BatchFeature to a different device.


class transformers.image_processing_utils.BaseImageProcessor

< >

( **kwargs )


< >

( image: ndarray size: Dict data_format: Union = None input_data_format: Union = None **kwargs )


  • image (np.ndarray) — Image to center crop.
  • size (Dict[str, int]) — Size of the output image.
  • data_format (str or ChannelDimension, optional) — The channel dimension format for the output image. If unset, the channel dimension format of the input image is used. Can be one of:
    • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
    • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
  • input_data_format (ChannelDimension or str, optional) — The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of:
    • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
    • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.

Center crop an image to (size["height"], size["width"]). If the input size is smaller than crop_size along any edge, the image is padded with 0’s and then center cropped.


< >

( image: ndarray mean: Union std: Union data_format: Union = None input_data_format: Union = None **kwargs ) np.ndarray


  • image (np.ndarray) — Image to normalize.
  • mean (float or Iterable[float]) — Image mean to use for normalization.
  • std (float or Iterable[float]) — Image standard deviation to use for normalization.
  • data_format (str or ChannelDimension, optional) — The channel dimension format for the output image. If unset, the channel dimension format of the input image is used. Can be one of:
    • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
    • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
  • input_data_format (ChannelDimension or str, optional) — The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of:
    • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
    • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.



The normalized image.

Normalize an image. image = (image - image_mean) / image_std.


< >

( image: ndarray scale: float data_format: Union = None input_data_format: Union = None **kwargs ) np.ndarray


  • image (np.ndarray) — Image to rescale.
  • scale (float) — The scaling factor to rescale pixel values by.
  • data_format (str or ChannelDimension, optional) — The channel dimension format for the output image. If unset, the channel dimension format of the input image is used. Can be one of:
    • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
    • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
  • input_data_format (ChannelDimension or str, optional) — The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of:
    • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
    • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.



The rescaled image.

Rescale an image by a scale factor. image = image * scale.