optimum-onnx documentation
Quantization
Quantization
ORTQuantizer
class optimum.onnxruntime.ORTQuantizer
< source >( onnx_model_path: Path config: PretrainedConfig | None = None )
Handles the ONNX Runtime quantization process for models shared on huggingface.co/models.
Computes the quantization ranges.
fit
< source >( dataset: Dataset calibration_config: CalibrationConfig onnx_augmented_model_name: str | Path = 'augmented_model.onnx' operators_to_quantize: list[str] | None = None batch_size: int = 1 use_external_data_format: bool = False use_gpu: bool = False force_symmetric_range: bool = False )
Parameters
-  dataset (Dataset) — The dataset to use when performing the calibration step.
-  calibration_config (~CalibrationConfig) — The configuration containing the parameters related to the calibration step.
-  onnx_augmented_model_name (Union[str, Path], defaults to"augmented_model.onnx") — The path used to save the augmented model used to collect the quantization ranges.
-  operators_to_quantize (Optional[List[str]], defaults toNone) — List of the operators types to quantize.
-  batch_size (int, defaults to 1) — The batch size to use when collecting the quantization ranges values.
-  use_external_data_format (bool, defaults toFalse) — Whether to use external data format to store model which size is >= 2Gb.
-  use_gpu (bool, defaults toFalse) — Whether to use the GPU when collecting the quantization ranges values.
-  force_symmetric_range (bool, defaults toFalse) — Whether to make the quantization ranges symmetric.
Performs the calibration step and computes the quantization ranges.
from_pretrained
< source >( model_or_path: ORTModel | str | Path file_name: str | None = None )
Parameters
-  model_or_path (Union[ORTModel, str, Path]) — Can be either:- A path to a saved exported ONNX Intermediate Representation (IR) model, e.g., `./my_model_directory/.
- Or an ORTModelForXXclass, e.g.,ORTModelForQuestionAnswering.
 
-  file_name(Optional[str], defaults toNone) — Overwrites the default model file name from"model.onnx"tofile_name. This allows you to load different model files from the same repository or directory.
Instantiates a ORTQuantizer from an ONNX model file or an ORTModel.
get_calibration_dataset
< source >( dataset_name: str num_samples: int = 100 dataset_config_name: str | None = None dataset_split: str | None = None preprocess_function: Callable | None = None preprocess_batch: bool = True seed: int = 2016 token: bool | str | None = None )
Parameters
-  dataset_name (str) — The dataset repository name on the Hugging Face Hub or path to a local directory containing data files to load to use for the calibration step.
-  num_samples (int, defaults to 100) — The maximum number of samples composing the calibration dataset.
-  dataset_config_name (Optional[str], defaults toNone) — The name of the dataset configuration.
-  dataset_split (Optional[str], defaults toNone) — Which split of the dataset to use to perform the calibration step.
-  preprocess_function (Optional[Callable], defaults toNone) — Processing function to apply to each example after loading dataset.
-  preprocess_batch (bool, defaults toTrue) — Whether thepreprocess_functionshould be batched.
-  seed (int, defaults to 2016) — The random seed to use when shuffling the calibration dataset.
-  token (Optional[Union[bool,str]], defaults toNone) — The token to use as HTTP bearer authorization for remote files. IfTrue, will use the token generated when runninghuggingface-cli login(stored inhuggingface_hub.constants.HF_TOKEN_PATH).
Creates the calibration datasets.Dataset to use for the post-training static quantization calibration step.
partial_fit
< source >( dataset: Dataset calibration_config: CalibrationConfig onnx_augmented_model_name: str | Path = 'augmented_model.onnx' operators_to_quantize: list[str] | None = None batch_size: int = 1 use_external_data_format: bool = False use_gpu: bool = False force_symmetric_range: bool = False )
Parameters
-  dataset (Dataset) — The dataset to use when performing the calibration step.
-  calibration_config (CalibrationConfig) — The configuration containing the parameters related to the calibration step.
-  onnx_augmented_model_name (Union[str, Path], defaults to"augmented_model.onnx") — The path used to save the augmented model used to collect the quantization ranges.
-  operators_to_quantize (Optional[List[str]], defaults toNone) — List of the operators types to quantize.
-  batch_size (int, defaults to 1) — The batch size to use when collecting the quantization ranges values.
-  use_external_data_format (bool, defaults toFalse) — Whether uto se external data format to store model which size is >= 2Gb.
-  use_gpu (bool, defaults toFalse) — Whether to use the GPU when collecting the quantization ranges values.
-  force_symmetric_range (bool, defaults toFalse) — Whether to make the quantization ranges symmetric.
Performs the calibration step and collects the quantization ranges without computing them.
quantize
< source >( quantization_config: QuantizationConfig save_dir: str | Path file_suffix: str | None = 'quantized' calibration_tensors_range: dict[str, tuple[float, float]] | None = None use_external_data_format: bool = False preprocessor: QuantizationPreprocessor | None = None )
Parameters
-  quantization_config (QuantizationConfig) — The configuration containing the parameters related to quantization.
-  save_dir (Union[str, Path]) — The directory where the quantized model should be saved.
-  file_suffix (Optional[str], defaults to"quantized") — The file_suffix used to save the quantized model.
-  calibration_tensors_range (Optional[Dict[str, Tuple[float, float]]], defaults toNone) — The dictionary mapping the nodes name to their quantization ranges, used and required only when applying static quantization.
-  use_external_data_format (bool, defaults toFalse) — Whether to use external data format to store model which size is >= 2Gb.
-  preprocessor (Optional[QuantizationPreprocessor], defaults toNone) — The preprocessor to use to collect the nodes to include or exclude from quantization.
Quantizes a model given the optimization specifications defined in quantization_config.