Optimum documentation

Benchmarking

You are viewing v1.3.0 version. A newer version v1.17.1 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Benchmarking

Run

class optimum.runs_base.Run

< >

( run_config: dict )

__init__

< >

( run_config: dict )

Parameters

  • run_config (dict) — Parameters to use for the run. See RunConfig for the expected keys.

Initialize the Run class holding methods to perform inference and evaluation given a config.

A run compares a transformers model and an optimized model on latency/throughput, model size, and provided metrics.

launch

< >

( ) dict

Returns

dict

Finalized run data with metrics stored in the “evaluation” key.

Launch inference to compare metrics between the original and optimized model.

These metrics are latency, throughput, model size, and user provided metrics.

load_datasets

< >

( )

Load evaluation dataset, and if needed, calibration dataset for static quantization.

get_calibration_dataset

< >

( ) datasets.Dataset

Returns

datasets.Dataset

Calibration dataset.

Get calibration dataset. The dataset needs to be loaded first with load_datasets().

get_eval_dataset

< >

( ) datasets.Dataset

Returns

datasets.Dataset

Evaluation dataset.

Get evaluation dataset. The dataset needs to be loaded first with load_datasets().

RunConfig

class optimum.utils.runs.RunConfig

< >

( metrics: typing.List[str] model_name_or_path: str task: str quantization_approach: QuantizationApproach dataset: DatasetArgs framework: Frameworks framework_args: FrameworkArgs operators_to_quantize: typing.Optional[typing.List[str]] = <factory> node_exclusion: typing.Optional[typing.List[str]] = <factory> per_channel: typing.Optional[bool] = False calibration: typing.Optional[optimum.utils.runs.Calibration] = None task_args: typing.Optional[optimum.utils.runs.TaskArgs] = None aware_training: typing.Optional[bool] = False max_eval_samples: typing.Optional[int] = None time_benchmark_args: typing.Optional[optimum.utils.runs.BenchmarkTimeArgs] = BenchmarkTimeArgs(duration=30, warmup_runs=10) batch_sizes: typing.Optional[typing.List[int]] = <factory> input_lengths: typing.Optional[typing.List[int]] = <factory> )

Parameters

  • metrics (List[str]) — List of metrics to evaluate on.
  • model_name_or_path (str) — Name of the model hosted on the Hub to use for the run.
  • task (str) — Task performed by the model.
  • quantization_approach (QuantizationApproach) — Whether to use dynamic or static quantization.
  • dataset (DatasetArgs) — Dataset to use. Several keys must be set on top of the dataset name.
  • framework (Frameworks) — Name of the framework used (e.g. “onnxruntime”).
  • framework_args (FrameworkArgs) — Framework-specific arguments.
  • operators_to_quantize (List[str], NoneType]) — Operators to quantize, doing no modifications to others (default: ["Add", "MatMul"]).
  • node_exclusion (List[str], NoneType]) — Specific nodes to exclude from being quantized (default: []).
  • per_channel (Union[bool, NoneType]) — Whether to quantize per channel (default: False).
  • calibration (Calibration, NoneType]) — Calibration parameters, in case static quantization is used.
  • task_args (TaskArgs, NoneType]) — Task-specific arguments (default: None).
  • aware_training (Union[bool, NoneType]) — Whether the quantization is to be done with Quantization-Aware Training (not supported).
  • max_eval_samples (Union[int, NoneType]) — Maximum number of samples to use from the evaluation dataset for evaluation.
  • time_benchmark_args (BenchmarkTimeArgs, NoneType]) — Parameters related to time benchmark.
  • batch_sizes (List[int], NoneType]) — Batch sizes to include in the run to measure time metrics.
  • input_lengths (List[int], NoneType]) — Input lengths to include in the run to measure time metrics.

Class holding the parameters to launch a run.

class optimum.utils.runs.Calibration

< >

( method: CalibrationMethods num_calibration_samples: int calibration_histogram_percentile: typing.Optional[float] = None calibration_moving_average: typing.Optional[bool] = None calibration_moving_average_constant: typing.Optional[float] = None )

Parameters

  • method (CalibrationMethods) — Calibration method used, either “minmax”, “entropy” or “percentile”.
  • num_calibration_samples (int) — Number of examples to use for the calibration step resulting from static quantization.
  • calibration_histogram_percentile (Union[float, NoneType]) — The percentile used for the percentile calibration method.
  • calibration_moving_average (Union[bool, NoneType]) — Whether to compute the moving average of the minimum and maximum values for the minmax calibration method.
  • calibration_moving_average_constant (Union[float, NoneType]) — Constant smoothing factor to use when computing the moving average of the minimum and maximum values. Effective only when the selected calibration method is minmax and calibration_moving_average is set to True.

Parameters for post-training calibration with static quantization.

class optimum.utils.runs.DatasetArgs

< >

( path: str eval_split: str data_keys: typing.Dict[str, typing.Union[NoneType, str]] ref_keys: typing.List[str] name: typing.Optional[str] = None calibration_split: typing.Optional[str] = None )

Parameters

  • path (str) — Path to the dataset, as in datasets.load_dataset(path).
  • eval_split (str) — Dataset split used for evaluation (e.g. “test”).
  • data_keys (Union[NoneType, str]]) — Dataset columns used as input data. At most two, indicated with “primary” and “secondary”.
  • ref_keys (List[str]) — Dataset column used for references during evaluation.
  • name (Union[str, NoneType]) — Name of the dataset, as in datasets.load_dataset(path, name).
  • calibration_split (Union[str, NoneType]) — Dataset split used for calibration (e.g. “train”).

Parameters related to the dataset.

class optimum.utils.runs.TaskArgs

< >

( is_regression: typing.Optional[bool] = None )

Parameters

  • is_regression (Union[bool, NoneType]) — Text classification specific. Set whether the task is regression (output = one float).

Task-specific parameters.

Processing utility methods

class optimum.utils.preprocessing.base.DatasetProcessing

< >

( dataset_path: str dataset_name: str preprocessor: typing.Union[transformers.feature_extraction_utils.FeatureExtractionMixin, transformers.tokenization_utils_base.PreTrainedTokenizerBase] eval_split: str static_quantization: bool data_keys: typing.Dict[str, str] ref_keys: typing.List[str] config: PretrainedConfig task_args: typing.Optional[typing.Dict] = None num_calibration_samples: typing.Optional[int] = None calibration_split: typing.Optional[str] = None max_eval_samples: typing.Optional[int] = None )

__init__

< >

( dataset_path: str dataset_name: str preprocessor: typing.Union[transformers.feature_extraction_utils.FeatureExtractionMixin, transformers.tokenization_utils_base.PreTrainedTokenizerBase] eval_split: str static_quantization: bool data_keys: typing.Dict[str, str] ref_keys: typing.List[str] config: PretrainedConfig task_args: typing.Optional[typing.Dict] = None num_calibration_samples: typing.Optional[int] = None calibration_split: typing.Optional[str] = None max_eval_samples: typing.Optional[int] = None )

Parameters

  • dataset_path (str) — Dataset path (https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/loading_methods#datasets.load_dataset.path)
  • dataset_name (str) — Dataset name (https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/loading_methods#datasets.load_dataset.name)
  • preprocessor (Union[FeatureExtractionMixin, PreTrainedTokenizerBase]) — Preprocessor used for evaluation.
  • eval_split (str) — Dataset split used for evaluation (e.g. “test”).
  • static_quantization (bool) — Static quantization is used.
  • data_keys (Dict[str, str]) — Map “primary” and “secondary” to data column names.
  • ref_keys (List[str]) — References column names.
  • config (PretrainedConfig) — Model configuration, useful for some tasks.
  • task_args(Dict, optional) — Task-specific arguments.
  • num_calibration_samples (int, optional) — Number of calibration samples for static quantization. Defaults to None.
  • calibration_split (str, optional) — Calibration split (e.g. “train”) for static quantization. Defaults to None.
  • max_eval_samples (int; optional) — Maximum number of samples to use from the evaluation dataset for evaluation.

Initialize the class in charge of loading datasets, running inference and evaluation.

This class should be task-dependent, backend independent.

load_datasets

< >

( ) Dict

Returns

Dict

Dictionary holding the datasets.

Load calibration dataset if needed, and evaluation dataset.

The evaluation dataset is meant to be used by a pipeline and is therefore not preprocessed. The calibration dataset is preprocessed.

run_inference

< >

( eval_dataset: Dataset pipeline: Pipeline ) tuple(List) comprising labels and predictions

Parameters

  • eval_dataset (Dataset) — Raw dataset to run inference on.
  • pipeline (Pipeline) — Pipeline used for inference. Should be initialized beforehand.

Returns

tuple(List) comprising labels and predictions

  • labels are the references for evaluation.
  • predictions are the predictions on the dataset using the pipeline.

Run inference on the provided dataset using a pipeline, and return all labels, predictions.

get_metrics

< >

( predictions references metric ) Dict

Parameters

  • predictions (List) — Predictions.
  • references (List) — References.
  • metric (Metric) — Pre-loaded metric to run evaluation on.

Returns

Dict

Computed metrics.

Compute a metric given pre-formatted predictions and references.

get_pipeline_kwargs

< >

( ) Dict

Returns

Dict

Task-specific kwargs to initialize the pipeline.

Get task-specific kwargs to initialize the pipeline.