Export functions
You can export models to ONNX from two frameworks in 🤗 Optimum: PyTorch and TensorFlow. There is an export function for each of these frameworks, export_pytorch() and export_tensorflow(), but the recommended way of using those is via the main export function ~optimum.exporters.main_export
, which will take care of using the proper exporting function according to the available framework, check that the exported model is valid, and provide extended options to run optimizations on the exported model.
Main functions
optimum.exporters.onnx.main_export
< source >( model_name_or_path: str output: typing.Union[str, pathlib.Path] task: str = 'auto' opset: typing.Optional[int] = None device: str = 'cpu' dtype: typing.Optional[str] = None fp16: typing.Optional[bool] = False optimize: typing.Optional[str] = None monolith: bool = False no_post_process: bool = False framework: typing.Optional[str] = None atol: typing.Optional[float] = None cache_dir: str = '/root/.cache/huggingface/hub' trust_remote_code: bool = False pad_token_id: typing.Optional[int] = None subfolder: str = '' revision: str = 'main' force_download: bool = False local_files_only: bool = False use_auth_token: typing.Union[bool, str, NoneType] = None token: typing.Union[bool, str, NoneType] = None for_ort: bool = False do_validation: bool = True model_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None custom_onnx_configs: typing.Optional[typing.Dict[str, ForwardRef('OnnxConfig')]] = None fn_get_submodels: typing.Optional[typing.Callable] = None use_subprocess: bool = False _variant: str = 'default' library_name: typing.Optional[str] = None legacy: bool = False no_dynamic_axes: bool = False do_constant_folding: bool = True **kwargs_shapes )
Required parameters
- model_name_or_path (
str
) — Model ID on huggingface.co or path on disk to the model repository to export. Example:model_name_or_path="BAAI/bge-m3"
ormode_name_or_path="/path/to/model_folder
. - output (
Union[str, Path]
) — Path indicating the directory where to store the generated ONNX model.
Optional parameters
- task (
Optional[str]
, defaults toNone
) — The task to export the model for. If not specified, the task will be auto-inferred based on the model. For decoder models, usexxx-with-past
to export the model using past key values in the decoder. - opset (
Optional[int]
, defaults toNone
) — If specified, ONNX opset version to export the model with. Otherwise, the default opset for the given model architecture will be used. - device (
str
, defaults to"cpu"
) — The device to use to do the export. Defaults to “cpu”. - fp16 (
Optional[bool]
, defaults to"False"
) — Use half precision during the export. PyTorch-only, requiresdevice="cuda"
. - dtype (
Optional[str]
, defaults toNone
) — The floating point precision to use for the export. Supported options:"fp32"
(float32),"fp16"
(float16),"bf16"
(bfloat16). Defaults to"fp32"
. - optimize (
Optional[str]
, defaults toNone
) — Allows to run ONNX Runtime optimizations directly during the export. Some of these optimizations are specific to ONNX Runtime, and the resulting ONNX will not be usable with other runtime as OpenVINO or TensorRT. Available options:"O1", "O2", "O3", "O4"
. Reference: AutoOptimizationConfig - monolith (
bool
, defaults toFalse
) — Forces to export the model as a single ONNX file. - no_post_process (
bool
, defaults toFalse
) — Allows to disable any post-processing done by default on the exported ONNX models. - framework (
Optional[str]
, defaults toNone
) — The framework to use for the ONNX export ("pt"
or"tf"
). If not provided, will attempt to automatically detect the framework for the checkpoint. - atol (
Optional[float]
, defaults toNone
) — If specified, the absolute difference tolerance when validating the model. Otherwise, the default atol for the model will be used. - cache_dir (
Optional[str]
, defaults toNone
) — Path indicating where to store cache. The default Hugging Face cache path will be used by default. - trust_remote_code (
bool
, defaults toFalse
) — Allows to use custom code for the modeling hosted in the model repository. This option should only be set for repositories you trust and in which you have read the code, as it will execute on your local machine arbitrary code present in the model repository. - pad_token_id (
Optional[int]
, defaults toNone
) — This is needed by some models, for some tasks. If not provided, will attempt to use the tokenizer to guess it. - subfolder (
str
, defaults to""
) — In case the relevant files are located inside a subfolder of the model repo either locally or on huggingface.co, you can specify the folder name here. - revision (
str
, defaults to"main"
) — Revision is the specific model version to use. It can be a branch name, a tag name, or a commit id. - force_download (
bool
, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - local_files_only (
Optional[bool]
, defaults toFalse
) — Whether or not to only look at local files (i.e., do not try to download the model). - use_auth_token (
Optional[Union[bool,str]]
, defaults toNone
) — Deprecated. Please use thetoken
argument instead. - token (
Optional[Union[bool,str]]
, defaults toNone
) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored inhuggingface_hub.constants.HF_TOKEN_PATH
). - model_kwargs (
Optional[Dict[str, Any]]
, defaults toNone
) — Experimental usage: keyword arguments to pass to the model during the export. This argument should be used along thecustom_onnx_configs
argument in case, for example, the model inputs/outputs are changed (for example, ifmodel_kwargs={"output_attentions": True}
is passed). - custom_onnx_configs (
Optional[Dict[str, OnnxConfig]]
, defaults toNone
) — Experimental usage: override the default ONNX config used for the given model. This argument may be useful for advanced users that desire a finer-grained control on the export. An example is available here. - fn_get_submodels (
Optional[Callable]
, defaults toNone
) — Experimental usage: Override the default submodels that are used at the export. This is especially useful when exporting a custom architecture that needs to split the ONNX (e.g. encoder-decoder). If unspecified with custom models, optimum will try to use the default submodels used for the given task, with no guarantee of success. - use_subprocess (
bool
, defaults toFalse
) — Do the ONNX exported model validation in subprocesses. This is especially useful when exporting on CUDA device, where ORT does not release memory at inference session destruction. When set toTrue
, themain_export
call should be guarded inif __name__ == "__main__":
block. - _variant (
str
, defaults todefault
) — Specify the variant of the ONNX export to use. - library_name (
Optional[str]
, defaults toNone
) — The library of the model ("transformers"
or"diffusers"
or"timm"
or"sentence_transformers"
). If not provided, will attempt to automatically detect the library name for the checkpoint. - legacy (
bool
, defaults toFalse
) — Disable the use of position_ids for text-generation models that require it for batched generation. Also enable to export decoder only models in three files (without + with past and the merged model). This argument is introduced for backward compatibility and will be removed in a future release of Optimum. - no_dynamic_axes (bool, defaults to
False
) — If True, disables the use of dynamic axes during ONNX export. - do_constant_folding (bool, defaults to
True
) — PyTorch-specific argument. IfTrue
, the PyTorch ONNX export will fold constants into adjacent nodes, if possible. - **kwargs_shapes (
Dict
) — Shapes to use during inference. This argument allows to override the default shapes used during the ONNX export.
Full-suite ONNX export function, exporting from a model ID on Hugging Face Hub or a local model repository.
optimum.exporters.onnx.onnx_export_from_model
< source >( model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel'), ForwardRef('DiffusionPipeline')] output: typing.Union[str, pathlib.Path] opset: typing.Optional[int] = None optimize: typing.Optional[str] = None monolith: bool = False no_post_process: bool = False atol: typing.Optional[float] = None do_validation: bool = True model_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None custom_onnx_configs: typing.Optional[typing.Dict[str, ForwardRef('OnnxConfig')]] = None fn_get_submodels: typing.Optional[typing.Callable] = None _variant: str = 'default' legacy: bool = False preprocessors: typing.List = None device: str = 'cpu' no_dynamic_axes: bool = False task: typing.Optional[str] = None use_subprocess: bool = False do_constant_folding: bool = True **kwargs_shapes )
Required parameters
- model (
Union["PreTrainedModel", "TFPreTrainedModel"]
) — PyTorch or TensorFlow model to export to ONNX. - output (
Union[str, Path]
) — Path indicating the directory where to store the generated ONNX model.
Optional parameters
- task (
Optional[str]
, defaults toNone
) — The task to export the model for. If not specified, the task will be auto-inferred based on the model. - opset (
Optional[int]
, defaults toNone
) — If specified, ONNX opset version to export the model with. Otherwise, the default opset for the given model architecture will be used. - device (
str
, defaults to"cpu"
) — The device to use to do the export. Defaults to “cpu”. - optimize (
Optional[str]
, defaults toNone
) — Allows to run ONNX Runtime optimizations directly during the export. Some of these optimizations are specific to ONNX Runtime, and the resulting ONNX will not be usable with other runtime as OpenVINO or TensorRT. Available options:"O1", "O2", "O3", "O4"
. Reference: AutoOptimizationConfig - monolith (
bool
, defaults toFalse
) — Forces to export the model as a single ONNX file. - no_post_process (
bool
, defaults toFalse
) — Allows to disable any post-processing done by default on the exported ONNX models. - atol (
Optional[float]
, defaults toNone
) — If specified, the absolute difference tolerance when validating the model. Otherwise, the default atol for the model will be used. - model_kwargs (
Optional[Dict[str, Any]]
, defaults toNone
) — Experimental usage: keyword arguments to pass to the model during the export. This argument should be used along thecustom_onnx_configs
argument in case, for example, the model inputs/outputs are changed (for example, ifmodel_kwargs={"output_attentions": True}
is passed). - custom_onnx_configs (
Optional[Dict[str, OnnxConfig]]
, defaults toNone
) — Experimental usage: override the default ONNX config used for the given model. This argument may be useful for advanced users that desire a finer-grained control on the export. An example is available here. - fn_get_submodels (
Optional[Callable]
, defaults toNone
) — Experimental usage: Override the default submodels that are used at the export. This is especially useful when exporting a custom architecture that needs to split the ONNX (e.g. encoder-decoder). If unspecified with custom models, optimum will try to use the default submodels used for the given task, with no guarantee of success. - use_subprocess (
bool
, defaults toFalse
) — Do the ONNX exported model validation in subprocesses. This is especially useful when exporting on CUDA device, where ORT does not release memory at inference session destruction. When set toTrue
, themain_export
call should be guarded inif __name__ == "__main__":
block. - _variant (
str
, defaults todefault
) — Specify the variant of the ONNX export to use. - legacy (
bool
, defaults toFalse
) — Disable the use of position_ids for text-generation models that require it for batched generation. Also enable to export decoder only models in three files (without + with past and the merged model). This argument is introduced for backward compatibility and will be removed in a future release of Optimum. - no_dynamic_axes (bool, defaults to
False
) — If True, disables the use of dynamic axes during ONNX export. - do_constant_folding (bool, defaults to
True
) — PyTorch-specific argument. IfTrue
, the PyTorch ONNX export will fold constants into adjacent nodes, if possible. - **kwargs_shapes (
Dict
) — Shapes to use during inference. This argument allows to override the default shapes used during the ONNX export.
Full-suite ONNX export function, exporting from a pre-loaded PyTorch or Tensorflow model. This function is especially useful in case one needs to do modifications on the model, as overriding a forward call, before exporting to ONNX.
optimum.exporters.onnx.export
< source >( model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel'), ForwardRef('ModelMixin')] config: OnnxConfig output: Path opset: typing.Optional[int] = None device: str = 'cpu' input_shapes: typing.Optional[typing.Dict] = None disable_dynamic_axes_fix: typing.Optional[bool] = False dtype: typing.Optional[str] = None no_dynamic_axes: bool = False do_constant_folding: bool = True model_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None ) → Tuple[List[str], List[str]]
Parameters
- model (
PreTrainedModel
orTFPreTrainedModel
) — The model to export. - config (OnnxConfig) — The ONNX configuration associated with the exported model.
- output (
Path
) — Directory to store the exported ONNX model. - opset (
Optional[int]
, defaults toNone
) — The version of the ONNX operator set to use. - device (
Optional[str]
, defaults to"cpu"
) — The device on which the ONNX model will be exported. Eithercpu
orcuda
. Only PyTorch is supported for export on CUDA devices. - input_shapes (
Optional[Dict]
, defaults toNone
) — If specified, allows to use specific shapes for the example input provided to the ONNX exporter. - disable_dynamic_axes_fix (
Optional[bool]
, defaults toFalse
) — Whether to disable the default dynamic axes fixing. - dtype (
Optional[str]
, defaults toNone
) — Data type to remap the model inputs to. PyTorch-only. Onlyfp16
is supported. - no_dynamic_axes (bool, defaults to
False
) — If True, disables the use of dynamic axes during ONNX export. - do_constant_folding (bool, defaults to
True
) — PyTorch-specific argument. IfTrue
, the PyTorch ONNX export will fold constants into adjacent nodes, if possible. - model_kwargs (
Optional[Dict[str, Any]]
, defaults toNone
) — Experimental usage: keyword arguments to pass to the model during the export. This argument should be used along thecustom_onnx_config
argument in case, for example, the model inputs/outputs are changed (for example, ifmodel_kwargs={"output_attentions": True}
is passed).
Returns
Tuple[List[str], List[str]]
A tuple with an ordered list of the model’s inputs, and the named outputs from the ONNX configuration.
Exports a Pytorch or TensorFlow model to an ONNX Intermediate Representation.
optimum.exporters.onnx.convert.export_pytorch
< source >( model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('ModelMixin')] config: OnnxConfig opset: int output: Path device: str = 'cpu' input_shapes: typing.Optional[typing.Dict] = None no_dynamic_axes: bool = False do_constant_folding: bool = True model_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None ) → Tuple[List[str], List[str]]
Parameters
- model (
PreTrainedModel
) — The model to export. - config (OnnxConfig) — The ONNX configuration associated with the exported model.
- opset (
int
) — The version of the ONNX operator set to use. - output (
Path
) — Path to save the exported ONNX file to. - device (
str
, defaults to"cpu"
) — The device on which the ONNX model will be exported. Eithercpu
orcuda
. Only PyTorch is supported for export on CUDA devices. - input_shapes (
Optional[Dict]
, defaults toNone
) — If specified, allows to use specific shapes for the example input provided to the ONNX exporter. - no_dynamic_axes (bool, defaults to
False
) — If True, disables the use of dynamic axes during ONNX export. - do_constant_folding (bool, defaults to
True
) — PyTorch-specific argument. IfTrue
, the PyTorch ONNX export will fold constants into adjacent nodes, if possible. - model_kwargs (
Optional[Dict[str, Any]]
, defaults toNone
) — Experimental usage: keyword arguments to pass to the model during the export. This argument should be used along thecustom_onnx_config
argument in case, for example, the model inputs/outputs are changed (for example, ifmodel_kwargs={"output_attentions": True}
is passed).
Returns
Tuple[List[str], List[str]]
A tuple with an ordered list of the model’s inputs, and the named outputs from the ONNX configuration.
Exports a PyTorch model to an ONNX Intermediate Representation.
optimum.exporters.onnx.convert.export_tensorflow
< source >( model: TFPreTrainedModel config: OnnxConfig opset: int output: Path ) → Tuple[List[str], List[str]]
Parameters
- model (
TFPreTrainedModel
) — The model to export. - config (OnnxConfig) — The ONNX configuration associated with the exported model.
- opset (
int
) — The version of the ONNX operator set to use. - output (
Path
) — Directory to store the exported ONNX model. - device (
Optional[str]
, defaults to"cpu"
) — The device on which the ONNX model will be exported. Eithercpu
orcuda
. Only PyTorch is supported for export on CUDA devices.
Returns
Tuple[List[str], List[str]]
A tuple with an ordered list of the model’s inputs, and the named outputs from the ONNX configuration.
Exports a TensorFlow model to an ONNX Intermediate Representation.
Utility functions
optimum.exporters.utils.check_dummy_inputs_are_allowed
< source >( model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel'), ForwardRef('ModelMixin')] dummy_input_names: typing.Iterable[str] )
Checks that the dummy inputs from the ONNX config is a subset of the allowed inputs for model
.
optimum.exporters.onnx.validate_model_outputs
< source >( config: OnnxConfig reference_model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel'), ForwardRef('ModelMixin')] onnx_model: Path onnx_named_outputs: typing.List[str] atol: typing.Optional[float] = None input_shapes: typing.Optional[typing.Dict] = None device: str = 'cpu' use_subprocess: typing.Optional[bool] = True model_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None )
Parameters
- config (
~OnnxConfig
— The configuration used to export the model. - reference_model (
~PreTrainedModel
or~TFPreTrainedModel
) — The model used for the export. - onnx_model (
Path
) — The path to the exported model. - onnx_named_outputs (
List[str]
) — The names of the outputs to check. - atol (
Optional[float]
, defaults toNone
) — The absolute tolerance in terms of outputs difference between the reference and the exported model. - input_shapes (
Optional[Dict]
, defaults toNone
) — If specified, allows to use specific shapes to validate the ONNX model on. - device (
str
, defaults to"cpu"
) — The device on which the ONNX model will be validated. Eithercpu
orcuda
. Validation on a CUDA device is supported only for PyTorch. - use_subprocess (
Optional[bool]
, defaults toTrue
) — Launch validation of each exported model in a subprocess. - model_kwargs (
Optional[Dict[str, Any]]
, defaults toNone
) — Experimental usage: keyword arguments to pass to the model during the export and validation.
Raises
ValueError
ValueError
— If the outputs shapes or values do not match between the reference and the exported model.
Validates the export by checking that the outputs from both the reference and the exported model match.