Optimum documentation

Quantization

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v1.19.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Quantization

ORTQuantizer

class optimum.onnxruntime.ORTQuantizer

< >

( onnx_model_path: Path config: typing.Optional[ForwardRef('PretrainedConfig')] = None )

Handles the ONNX Runtime quantization process for models shared on huggingface.co/models.

compute_ranges

< >

( )

Computes the quantization ranges.

fit

< >

( dataset: Dataset calibration_config: CalibrationConfig onnx_augmented_model_name: typing.Union[str, pathlib.Path] = 'augmented_model.onnx' operators_to_quantize: typing.Optional[typing.List[str]] = None batch_size: int = 1 use_external_data_format: bool = False use_gpu: bool = False force_symmetric_range: bool = False )

Parameters

  • dataset (Dataset) — The dataset to use when performing the calibration step.
  • calibration_config (~CalibrationConfig) — The configuration containing the parameters related to the calibration step.
  • onnx_augmented_model_name (Union[str, Path], defaults to "augmented_model.onnx") — The path used to save the augmented model used to collect the quantization ranges.
  • operators_to_quantize (Optional[List[str]], defaults to None) — List of the operators types to quantize.
  • batch_size (int, defaults to 1) — The batch size to use when collecting the quantization ranges values.
  • use_external_data_format (bool, defaults to False) — Whether to use external data format to store model which size is >= 2Gb.
  • use_gpu (bool, defaults to False) — Whether to use the GPU when collecting the quantization ranges values.
  • force_symmetric_range (bool, defaults to False) — Whether to make the quantization ranges symmetric.

Performs the calibration step and computes the quantization ranges.

from_pretrained

< >

( model_or_path: typing.Union[ForwardRef('ORTModel'), str, pathlib.Path] file_name: typing.Optional[str] = None )

Parameters

  • model_or_path (Union[ORTModel, str, Path]) — Can be either:
    • A path to a saved exported ONNX Intermediate Representation (IR) model, e.g., `./my_model_directory/.
    • Or an ORTModelForXX class, e.g., ORTModelForQuestionAnswering.
  • file_name(Optional[str], defaults to None) — Overwrites the default model file name from "model.onnx" to file_name. This allows you to load different model files from the same repository or directory.

Instantiates a ORTQuantizer from an ONNX model file or an ORTModel.

get_calibration_dataset

< >

( dataset_name: str num_samples: int = 100 dataset_config_name: typing.Optional[str] = None dataset_split: typing.Optional[str] = None preprocess_function: typing.Optional[typing.Callable] = None preprocess_batch: bool = True seed: int = 2016 use_auth_token: bool = False )

Parameters

  • dataset_name (str) — The dataset repository name on the Hugging Face Hub or path to a local directory containing data files to load to use for the calibration step.
  • num_samples (int, defaults to 100) — The maximum number of samples composing the calibration dataset.
  • dataset_config_name (Optional[str], defaults to None) — The name of the dataset configuration.
  • dataset_split (Optional[str], defaults to None) — Which split of the dataset to use to perform the calibration step.
  • preprocess_function (Optional[Callable], defaults to None) — Processing function to apply to each example after loading dataset.
  • preprocess_batch (bool, defaults to True) — Whether the preprocess_function should be batched.
  • seed (int, defaults to 2016) — The random seed to use when shuffling the calibration dataset.
  • use_auth_token (bool, defaults to False) — Whether to use the token generated when running transformers-cli login (necessary for some datasets like ImageNet).

Creates the calibration datasets.Dataset to use for the post-training static quantization calibration step.

partial_fit

< >

( dataset: Dataset calibration_config: CalibrationConfig onnx_augmented_model_name: typing.Union[str, pathlib.Path] = 'augmented_model.onnx' operators_to_quantize: typing.Optional[typing.List[str]] = None batch_size: int = 1 use_external_data_format: bool = False use_gpu: bool = False force_symmetric_range: bool = False )

Parameters

  • dataset (Dataset) — The dataset to use when performing the calibration step.
  • calibration_config (CalibrationConfig) — The configuration containing the parameters related to the calibration step.
  • onnx_augmented_model_name (Union[str, Path], defaults to "augmented_model.onnx") — The path used to save the augmented model used to collect the quantization ranges.
  • operators_to_quantize (Optional[List[str]], defaults to None) — List of the operators types to quantize.
  • batch_size (int, defaults to 1) — The batch size to use when collecting the quantization ranges values.
  • use_external_data_format (bool, defaults to False) — Whether uto se external data format to store model which size is >= 2Gb.
  • use_gpu (bool, defaults to False) — Whether to use the GPU when collecting the quantization ranges values.
  • force_symmetric_range (bool, defaults to False) — Whether to make the quantization ranges symmetric.

Performs the calibration step and collects the quantization ranges without computing them.

quantize

< >

( quantization_config: QuantizationConfig save_dir: typing.Union[str, pathlib.Path] file_suffix: typing.Optional[str] = 'quantized' calibration_tensors_range: typing.Union[typing.Dict[str, typing.Tuple[float, float]], NoneType] = None use_external_data_format: bool = False preprocessor: typing.Optional[optimum.onnxruntime.preprocessors.quantization.QuantizationPreprocessor] = None )

Parameters

  • quantization_config (QuantizationConfig) — The configuration containing the parameters related to quantization.
  • save_dir (Union[str, Path]) — The directory where the quantized model should be saved.
  • file_suffix (Optional[str], defaults to "quantized") — The file_suffix used to save the quantized model.
  • calibration_tensors_range (Optional[Dict[str, Tuple[float, float]]], defaults to None) — The dictionary mapping the nodes name to their quantization ranges, used and required only when applying static quantization.
  • use_external_data_format (bool, defaults to False) — Whether to use external data format to store model which size is >= 2Gb.
  • preprocessor (Optional[QuantizationPreprocessor], defaults to None) — The preprocessor to use to collect the nodes to include or exclude from quantization.

Quantizes a model given the optimization specifications defined in quantization_config.

< > Update on GitHub