Feature Extractor¶
A feature extractor is in charge of preparing input features for a multi-modal model. This includes feature extraction from sequences, e.g., pre-processing audio files to Log-Mel Spectrogram features, feature extraction from images e.g. cropping image image files, but also padding, normalization, and conversion to Numpy, PyTorch, and TensorFlow tensors.
FeatureExtractionMixin¶
-
class
transformers.feature_extraction_utils.
FeatureExtractionMixin
(**kwargs)[source]¶ This is a feature extraction mixin used to provide saving/loading functionality for sequential and image feature extractors.
-
classmethod
from_pretrained
(pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) → SequenceFeatureExtractor[source]¶ Instantiate a type of
FeatureExtractionMixin
from a feature extractor, e.g. a derived class ofSequenceFeatureExtractor
.- Parameters
pretrained_model_name_or_path (
str
oros.PathLike
) –This can be either:
a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like
bert-base-uncased
, or namespaced under a user or organization name, likedbmdz/bert-base-german-cased
.a path to a directory containing a feature extractor file saved using the
save_pretrained()
method, e.g.,./my_model_directory/
.a path or url to a saved feature extractor JSON file, e.g.,
./my_model_directory/preprocessor_config.json
.
cache_dir (
str
oros.PathLike
, optional) – Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used.force_download (
bool
, optional, defaults toFalse
) – Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist.resume_download (
bool
, optional, defaults toFalse
) – Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.proxies (
Dict[str, str]
, optional) – A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request.use_auth_token (
str
or bool, optional) – The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runningtransformers-cli login
(stored inhuggingface
).revision (
str
, optional, defaults to"main"
) – The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.return_unused_kwargs (
bool
, optional, defaults toFalse
) – IfFalse
, then this function returns just the final feature extractor object. IfTrue
, then this functions returns aTuple(feature_extractor, unused_kwargs)
where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part ofkwargs
which has not been used to updatefeature_extractor
and is otherwise ignored.kwargs (
Dict[str, Any]
, optional) – The values in kwargs of any keys which are feature extractor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not feature extractor attributes is controlled by thereturn_unused_kwargs
keyword parameter.
Note
Passing
use_auth_token=True
is required when you want to use a private model.- Returns
A feature extractor of type
FeatureExtractionMixin
.
Examples:
# We can't instantiate directly the base class `FeatureExtractionMixin` nor `SequenceFeatureExtractor` so let's show the examples on a # derived class: `Wav2Vec2FeatureExtractor` feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h') # Download feature_extraction_config from huggingface.co and cache. feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('./test/saved_model/') # E.g. feature_extractor (or model) was saved using `save_pretrained('./test/saved_model/')` feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('./test/saved_model/preprocessor_config.json') feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h', return_attention_mask=False, foo=False) assert feature_extractor.return_attention_mask is False feature_extractor, unused_kwargs = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h', return_attention_mask=False, foo=False, return_unused_kwargs=True) assert feature_extractor.return_attention_mask is False assert unused_kwargs == {'foo': False}
-
save_pretrained
(save_directory: Union[str, os.PathLike])[source]¶ Save a feature_extractor object to the directory
save_directory
, so that it can be re-loaded using thefrom_pretrained()
class method.- Parameters
save_directory (
str
oros.PathLike
) – Directory where the feature extractor JSON file will be saved (will be created if it does not exist).
-
classmethod
SequenceFeatureExtractor¶
-
class
transformers.
SequenceFeatureExtractor
(feature_size: int, sampling_rate: int, padding_value: float, **kwargs)[source]¶ This is a general feature extraction class for speech recognition.
- Parameters
feature_size (
int
) – The feature dimension of the extracted features.sampling_rate (
int
) – The sampling rate at which the audio files should be digitalized expressed in Hertz per second (Hz).padding_value (
float
) – The value that is used to fill the padding values / vectors.
-
pad
(processed_features: Union[transformers.feature_extraction_utils.BatchFeature, List[transformers.feature_extraction_utils.BatchFeature], Dict[str, transformers.feature_extraction_utils.BatchFeature], Dict[str, List[transformers.feature_extraction_utils.BatchFeature]], List[Dict[str, transformers.feature_extraction_utils.BatchFeature]]], padding: Union[bool, str, transformers.file_utils.PaddingStrategy] = True, max_length: Optional[int] = None, pad_to_multiple_of: Optional[int] = None, return_attention_mask: Optional[bool] = None, return_tensors: Optional[Union[str, transformers.file_utils.TensorType]] = None) → transformers.feature_extraction_utils.BatchFeature[source]¶ Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the max sequence length in the batch.
Padding side (left/right) padding values are defined at the feature extractor level (with
self.padding_side
,self.padding_value
)Note
If the
processed_features
passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, the result will use the same type unless you provide a different tensor type withreturn_tensors
. In the case of PyTorch tensors, you will lose the specific device of your tensors however.- Parameters
processed_features (
BatchFeature
, list ofBatchFeature
,Dict[str, List[float]]
,Dict[str, List[List[float]]
orList[Dict[str, List[float]]]
) –Processed inputs. Can represent one input (
BatchFeature
orDict[str, List[float]]
) or a batch of input values / vectors (list ofBatchFeature
, Dict[str, List[List[float]]] or List[Dict[str, List[float]]]) so you can use this method during preprocessing as well as in a PyTorch Dataloader collate function.Instead of
List[float]
you can have tensors (numpy arrays, PyTorch tensors or TensorFlow tensors), see the note above for the return type.padding (
bool
,str
orPaddingStrategy
, optional, defaults toTrue
) –Select a strategy to pad the returned sequences (according to the model’s padding side and padding index) among:
True
or'longest'
: Pad to the longest sequence in the batch (or no padding if only a single sequence if provided).'max_length'
: Pad to a maximum length specified with the argumentmax_length
or to the maximum acceptable input length for the model if that argument is not provided.False
or'do_not_pad'
(default): No padding (i.e., can output a batch with sequences of different lengths).
max_length (
int
, optional) – Maximum length of the returned list and optionally padding length (see above).pad_to_multiple_of (
int
, optional) –If set will pad the sequence to a multiple of the provided value.
This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.
return_attention_mask (
bool
, optional) –Whether to return the attention mask. If left to the default, will return the attention mask according to the specific feature_extractor’s default.
return_tensors (
str
orTensorType
, optional) –If set, will return tensors instead of list of python integers. Acceptable values are:
'tf'
: Return TensorFlowtf.constant
objects.'pt'
: Return PyTorchtorch.Tensor
objects.'np'
: Return Numpynp.ndarray
objects.
BatchFeature¶
-
class
transformers.
BatchFeature
(data: Optional[Dict[str, Any]] = None, tensor_type: Optional[Union[str, transformers.file_utils.TensorType]] = None)[source]¶ Holds the output of the
pad()
and feature extractor specific__call__
methods.This class is derived from a python dictionary and can be used as a dictionary.
- Parameters
data (
dict
) – Dictionary of lists/arrays/tensors returned by the __call__/pad methods (‘input_values’, ‘attention_mask’, etc.).tensor_type (
Union[None, str, TensorType]
, optional) – You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at initialization.
-
convert_to_tensors
(tensor_type: Optional[Union[str, transformers.file_utils.TensorType]] = None)[source]¶ Convert the inner content to tensors.
- Parameters
tensor_type (
str
orTensorType
, optional) – The type of tensors to use. Ifstr
, should be one of the values of the enumTensorType
. IfNone
, no modification is done.
-
to
(device: Union[str, torch.device]) → BatchFeature[source]¶ Send all values to device by calling
v.to(device)
(PyTorch only).- Parameters
device (
str
ortorch.device
) – The device to put the tensors on.- Returns
The same instance after modification.
- Return type
ImageFeatureExtractionMixin¶
-
class
transformers.image_utils.
ImageFeatureExtractionMixin
[source]¶ Mixin that contain utilities for preparing image features.
-
center_crop
(image, size)[source]¶ Crops
image
to the given size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result has the size asked).- Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) – The image to resize.size (
int
orTuple[int, int]
) – The size to which crop the image.
-
normalize
(image, mean, std)[source]¶ Normalizes
image
withmean
andstd
. Note that this will trigger a conversion ofimage
to a NumPy array if it’s a PIL Image.- Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) – The image to normalize.mean (
List[float]
ornp.ndarray
ortorch.Tensor
) – The mean (per channel) to use for normalization.std (
List[float]
ornp.ndarray
ortorch.Tensor
) – The standard deviation (per channel) to use for normalization.
-
resize
(image, size, resample=2)[source]¶ Resizes
image
. Note that this will trigger a conversion ofimage
to a PIL Image.- Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) – The image to resize.size (
int
orTuple[int, int]
) – The size to use for resizing the image.resample (
int
, optional, defaults toPIL.Image.BILINEAR
) – The filter to user for resampling.
-
to_numpy_array
(image, rescale=None, channel_first=True)[source]¶ Converts
image
to a numpy array. Optionally rescales it and puts the channel dimension as the first dimension.- Parameters
image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) – The image to convert to a NumPy array.rescale (
bool
, optional) – Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Will default toTrue
if the image is a PIL Image or an array/tensor of integers,False
otherwise.channel_first (
bool
, optional, defaults toTrue
) – Whether or not to permute the dimensions of the image to put the channel dimension first.
-
to_pil_image
(image, rescale=None)[source]¶ Converts
image
to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed.- Parameters
image (
PIL.Image.Image
ornumpy.ndarray
ortorch.Tensor
) – The image to convert to the PIL Image format.rescale (
bool
, optional) – Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default toTrue
if the image type is a floating type,False
otherwise.
-