Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Optimization

🤗 Optimum provides an optimum.onnxruntime package that enables you to apply graph optimization on many model hosted on the 🤗 hub using the ONNX Runtime model optimization tool.

Creating an ORTOptimizer

The ORTOptimizer class is used to optimize your ONNX model. The class can be initialized using the from_pretrained() method, which supports different checkpoint formats.

  1. Using an already initialized ORTModelForXXX class.
>>> from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification

# Loading ONNX Model from the Hub
>>> model = ORTModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")

# Create an optimizer from an ORTModelForXXX
>>> optimizer = ORTOptimizer.from_pretrained(model)
  1. Using a local ONNX model from a directory.
>>> from optimum.onnxruntime import ORTOptimizer

# This assumes a model.onnx exists in path/to/model
>>> optimizer = ORTOptimizer.from_pretrained("path/to/model")

Optimization examples

Below you will find an easy end-to-end example on how to optimize distilbert-base-uncased-finetuned-sst-2-english.

>>> from optimum.onnxruntime import ORTOptimizer, ORTModelForSequenceClassification
>>> from optimum.onnxruntime.configuration import OptimizationConfig

>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "/tmp/outputs"

# Load a PyTorch model and export it to the ONNX format
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, from_transformers=True)

# Create the optimizer
>>> optimizer = ORTOptimizer.from_pretrained(model)

# Define the optimization strategy by creating the appropriate configuration
>>> optimization_config = OptimizationConfig(
    optimization_level=2,
    optimize_with_onnxruntime_only=False,
    optimize_for_gpu=False,
)

# Optimize the model
>>> optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)

Below you will find an easy end-to-end example on how to optimize a Seq2Seq model sshleifer/distilbart-cnn-12-6”.

>>> from optimum.onnxruntime import ORTOptimizer, ORTModelForSeq2SeqLM
>>> from optimum.onnxruntime.configuration import OptimizationConfig
>>> from transformers import AutoTokenizer

>>> model_id = "sshleifer/distilbart-cnn-12-6"
>>> save_dir = "/tmp/outputs"

# Load a PyTorch model and export it to the ONNX format
>>> model = ORTModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)

# Create the optimizer
>>> optimizer = ORTOptimizer.from_pretrained(model)

# Define the optimization strategy by creating the appropriate configuration
>>> optimization_config = OptimizationConfig(
    optimization_level=2,
    optimize_with_onnxruntime_only=False,
    optimize_for_gpu=False,
)

# Optimize the model
>>> optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)

# Load the resulting optimized model
>>> optimized_model = ORTModelForSeq2SeqLM.from_pretrained(
    save_dir,
    encoder_file_name="encoder_model_optimized.onnx",
    decoder_file_name="decoder_model_optimized.onnx",
    decoder_file_with_past_name="decoder_with_past_model_optimized.onnx",
)
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>> tokens = tokenizer("This is a sample input", return_tensors="pt")
>>> outputs = optimized_model.generate(**tokens)

ORTOptimizer

class optimum.onnxruntime.ORTOptimizer

< >

( onnx_model_path: typing.List[os.PathLike] config: PretrainedConfig )

Handles the ONNX Runtime optimization process for models shared on huggingface.co/models.

from_pretrained

< >

( model_or_path: typing.Union[str, os.PathLike, optimum.onnxruntime.modeling_ort.ORTModel] file_names: typing.Optional[typing.List[str]] = None )

Parameters

  • model_or_path (Union[str, os.PathLike, ORTModel]) — The path to a local directory hosting the model to optimize or an instance of an ORTModel to quantize. Can be either:
    • A path to a local directory containing the model to optimize.
    • An instance of ORTModel.
  • file_names(List[str], optional) — The list of file names of the models to optimize.

get_fused_operators

< >

( onnx_model_path: typing.Union[str, os.PathLike] )

Parameters

  • onnx_model_path (Union[str, os.PathLike]) — Path of the ONNX model.

Compute the dictionary mapping the name of the fused operators to their number of apparition in the model.

get_nodes_number_difference

< >

( onnx_model_path: typing.Union[str, os.PathLike] onnx_optimized_model_path: typing.Union[str, os.PathLike] )

Parameters

  • onnx_model_path (Union[str, os.PathLike]) — Path of the ONNX model.
  • onnx_optimized_model_path (Union[str, os.PathLike]) — Path of the optimized ONNX model.

Compute the difference in the number of nodes between the original and the optimized model.

get_operators_difference

< >

( onnx_model_path: typing.Union[str, os.PathLike] onnx_optimized_model_path: typing.Union[str, os.PathLike] )

Parameters

  • onnx_model_path (Union[str, os.PathLike]) — Path of the ONNX model.
  • onnx_optimized_model_path (Union[str, os.PathLike]) — Path of the optimized ONNX model.

Compute the dictionary mapping the operators name to the difference in the number of corresponding nodes between the original and the optimized model.

optimize

< >

( optimization_config: OptimizationConfig save_dir: typing.Union[str, os.PathLike] file_suffix: typing.Optional[str] = 'optimized' use_external_data_format: bool = False )

Parameters

  • optimization_config (OptimizationConfig) — The configuration containing the parameters related to optimization.
  • save_dir (Union[str, os.PathLike]) — The path used to save the optimized model.
  • file_suffix (str, optional, defaults to "optimized") — The file suffix used to save the optimized model.
  • use_external_data_format (bool, optional, defaults to False) — Whether to use external data format to store model of size >= 2Gb.

Optimize a model given the optimization specifications defined in optimization_config.