Transformers documentation

Utilities for Image Processors

Transformers

You are viewing v4.24.0 version. A newer version v4.53.3 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Utilities for Image Processors

This page lists all the utility functions used by the image processors, mainly the functional transformations used to process the images.

Most of those are only useful if you are studying the code of the image processors in the library.

Image Transformations

transformers.image_transforms.center_crop

< source >

( image: ndarray size: typing.Tuple[int, int] data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None return_numpy: typing.Optional[bool] = None ) → np.ndarray

Parameters

image (np.ndarray) — The image to crop.
size (Tuple[int, int]) — The target size for the cropped image.
data_format (str or ChannelDimension, optional) — The channel dimension format for the output image. Can be one of:
- "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
- "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format. If unset, will use the inferred format of the input image.
return_numpy (bool, optional) — Whether or not to return the cropped image as a numpy array. Used for backwards compatibility with the previous ImageFeatureExtractionMixin method.
- Unset: will return the same type as the input image.
- True: will return a numpy array.
- False: will return a PIL.Image.Image object.

Returns

np.ndarray

The cropped image.

Crops the image to the specified size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result will always be of size size).

transformers.image_transforms.normalize

< source >

( image: ndarray mean: typing.Union[float, typing.Iterable[float]] std: typing.Union[float, typing.Iterable[float]] data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None )

Parameters

image (np.ndarray) — The image to normalize.
mean (float or Iterable[float]) — The mean to use for normalization.
std (float or Iterable[float]) — The standard deviation to use for normalization.
data_format (ChannelDimension, optional) — The channel dimension format of the output image. If None, will use the inferred format from the input.

Normalizes image using the mean and standard deviation specified by mean and std.

image = (image - mean) / std

transformers.rescale

< source >

( image: ndarray scale: float data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None dtype = <class 'numpy.float32'> ) → np.ndarray

Parameters

image (np.ndarray) — The image to rescale.
scale (float) — The scale to use for rescaling the image.
data_format (ChannelDimension, optional) — The channel dimension format of the image. If not provided, it will be the same as the input image.
dtype (np.dtype, optional, defaults to np.float32) — The dtype of the output image. Defaults to np.float32. Used for backwards compatibility with feature extractors.

Returns

np.ndarray

The rescaled image.

Rescales image by scale.

transformers.resize

< source >

( image size: typing.Tuple[int, int] resample = <Resampling.BILINEAR: 2> data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None return_numpy: bool = True ) → np.ndarray

Parameters

image (PIL.Image.Image or np.ndarray or torch.Tensor) — The image to resize.
size (Tuple[int, int]) — The size to use for resizing the image.
resample (int, optional, defaults to PIL.Image.Resampling.BILINEAR) — The filter to user for resampling.
data_format (ChannelDimension, optional) — The channel dimension format of the output image. If None, will use the inferred format from the input.
return_numpy (bool, optional, defaults to True) — Whether or not to return the resized image as a numpy array. If False a PIL.Image.Image object is returned.

Returns

np.ndarray

The resized image.

Resizes image to (h, w) specified by size using the PIL library.

transformers.to_pil_image

< source >

( image: typing.Union[numpy.ndarray, PIL.Image.Image, ForwardRef('torch.Tensor'), ForwardRef('tf.Tensor'), ForwardRef('jnp.Tensor')] do_rescale: typing.Optional[bool] = None ) → PIL.Image.Image

Parameters

image (PIL.Image.Image or numpy.ndarray or torch.Tensor or tf.Tensor) — The image to convert to the PIL.Image format.
do_rescale (bool, optional) — Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default to True if the image type is a floating type, False otherwise.

Returns

PIL.Image.Image

The converted image.

Converts image to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed.

ImageProcessorMixin

class transformers.FeatureExtractionMixin

< source >

( **kwargs )

This is a feature extraction mixin used to provide saving/loading functionality for sequential and image feature extractors.

from_dict

< source >

( feature_extractor_dict: typing.Dict[str, typing.Any] **kwargs ) → FeatureExtractionMixin

Parameters

feature_extractor_dict (Dict[str, Any]) — Dictionary that will be used to instantiate the feature extractor object. Such a dictionary can be retrieved from a pretrained checkpoint by leveraging the to_dict() method.
kwargs (Dict[str, Any]) — Additional parameters from which to initialize the feature extractor object.

Returns

FeatureExtractionMixin

The feature extractor object instantiated from those parameters.

Instantiates a type of FeatureExtractionMixin from a Python dictionary of parameters.

from_json_file

< source >

( json_file: typing.Union[str, os.PathLike] ) → A feature extractor of type FeatureExtractionMixin

Parameters

json_file (str or os.PathLike) — Path to the JSON file containing the parameters.

Returns

A feature extractor of type FeatureExtractionMixin

The feature_extractor object instantiated from that JSON file.

Instantiates a feature extractor of type FeatureExtractionMixin from the path to a JSON file of parameters.

from_pretrained

< source >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] **kwargs )

Parameters

pretrained_model_name_or_path (str or os.PathLike) — This can be either:
- a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.
- a path to a directory containing a feature extractor file saved using the save_pretrained() method, e.g., ./my_model_directory/.
- a path or url to a saved feature extractor JSON file, e.g., ./my_model_directory/preprocessor_config.json.
cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used.
force_download (bool, optional, defaults to False) — Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist.
resume_download (bool, optional, defaults to False) — Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.
proxies (Dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
use_auth_token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, or not specified, will use the token generated when running huggingface-cli login (stored in ~/.huggingface).
revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

Instantiate a type of FeatureExtractionMixin from a feature extractor, e.g. a derived class of SequenceFeatureExtractor.

Examples:

# We can't instantiate directly the base class *FeatureExtractionMixin* nor *SequenceFeatureExtractor* so let's show the examples on a
# derived class: *Wav2Vec2FeatureExtractor*
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h"
)  # Download feature_extraction_config from huggingface.co and cache.
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "./test/saved_model/"
)  # E.g. feature_extractor (or model) was saved using *save_pretrained('./test/saved_model/')*
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("./test/saved_model/preprocessor_config.json")
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False
)
assert feature_extractor.return_attention_mask is False
feature_extractor, unused_kwargs = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False, return_unused_kwargs=True
)
assert feature_extractor.return_attention_mask is False
assert unused_kwargs == {"foo": False}

get_feature_extractor_dict

< source >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] **kwargs ) → Tuple[Dict, Dict]

Parameters

pretrained_model_name_or_path (str or os.PathLike) — The identifier of the pre-trained checkpoint from which we want the dictionary of parameters.

Returns

Tuple[Dict, Dict]

The dictionary(ies) that will be used to instantiate the feature extractor object.

From a pretrained_model_name_or_path, resolve to a dictionary of parameters, to be used for instantiating a feature extractor of type FeatureExtractionMixin using from_dict.

push_to_hub

< source >

( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None use_auth_token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '10GB' create_pr: bool = False **deprecated_kwargs )

Parameters

repo_id (str) — The name of the repository you want to push your feature extractor to. It should contain your organization name when pushing to a given organization.
use_temp_dir (bool, optional) — Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default to True if there is no directory named like repo_id, False otherwise.
commit_message (str, optional) — Message to commit while pushing. Will default to "Upload feature extractor".
private (bool, optional) — Whether or not the repository created should be private (requires a paying subscription).
use_auth_token (bool or str, optional) — The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running huggingface-cli login (stored in ~/.huggingface). Will default to True if repo_url is not specified.
max_shard_size (int or str, optional, defaults to "10GB") — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like "5MB").
create_pr (bool, optional, defaults to False) — Whether or not to create a PR with the uploaded files or directly commit.

Upload the feature extractor file to the 🤗 Model Hub while synchronizing a local clone of the repo in repo_path_or_name.

Examples:

from transformers import AutoFeatureExtractor

feature extractor = AutoFeatureExtractor.from_pretrained("bert-base-cased")

# Push the feature extractor to your namespace with the name "my-finetuned-bert".
feature extractor.push_to_hub("my-finetuned-bert")

# Push the feature extractor to an organization with the name "my-finetuned-bert".
feature extractor.push_to_hub("huggingface/my-finetuned-bert")

register_for_auto_class

< source >

( auto_class = 'AutoFeatureExtractor' )

Parameters

auto_class (str or type, optional, defaults to "AutoFeatureExtractor") — The auto class to register this new feature extractor with.

Register this class with a given auto class. This should only be used for custom feature extractors as the ones in the library are already mapped with AutoFeatureExtractor.

This API is experimental and may have some slight breaking changes in the next releases.

save_pretrained

< source >

( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )

Parameters

save_directory (str or os.PathLike) — Directory where the feature extractor JSON file will be saved (will be created if it does not exist).
push_to_hub (bool, optional, defaults to False) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to with repo_id (will default to the name of save_directory in your namespace). kwargs — Additional key word arguments passed along to the push_to_hub() method.