Transformers documentation

ONNX

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.49.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

ONNX

ONNX is an open standard that defines a common set of operators and a file format to represent deep learning models in different frameworks, including PyTorch and TensorFlow. When a model is exported to ONNX, the operators construct a computational graph (or intermediate representation) which represents the flow of data through the model. Standardized operators and data types makes it easy to switch between frameworks.

The Optimum library exports a model to ONNX with configuration objects which are supported for many architectures and can be easily extended. If a model isn’t supported, feel free to make a contribution to Optimum.

The benefits of exporting to ONNX include the following.

Export a Transformers model to ONNX with the Optimum CLI or the optimum.onnxruntime module.

Optimum CLI

Run the command below to install Optimum and the exporters module.

pip install optimum[exporters]

Refer to the Export a model to ONNX with optimum.exporters.onnx guide for all available arguments or with the command below.

optimum-cli export onnx --help

Set the --model argument to export a PyTorch or TensorFlow model from the Hub.

optimum-cli export onnx --model distilbert/distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/

You should see logs indicating the progress and showing where the resulting model.onnx is saved.

Validating ONNX model distilbert_base_uncased_squad_onnx/model.onnx...
	-[✓] ONNX model output names match reference model (start_logits, end_logits)
	- Validating ONNX Model output "start_logits":
		-[✓] (2, 16) matches (2, 16)
		-[✓] all values close (atol: 0.0001)
	- Validating ONNX Model output "end_logits":
		-[✓] (2, 16) matches (2, 16)
		-[✓] all values close (atol: 0.0001)
The ONNX export succeeded and the exported model was saved at: distilbert_base_uncased_squad_onnx

For local models, make sure the model weights and tokenizer files are saved in the same directory, for example local_path. Pass the directory to the --model argument and use --task to indicate the task a model can perform. If --task isn’t provided, the model architecture without a task-specific head is used.

optimum-cli export onnx --model local_path --task question-answering distilbert_base_uncased_squad_onnx/

The model.onnx file can be deployed with any accelerator that supports ONNX. The example below demonstrates loading and running a model with ONNX Runtime.

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert_base_uncased_squad_onnx")
>>> model = ORTModelForQuestionAnswering.from_pretrained("distilbert_base_uncased_squad_onnx")
>>> inputs = tokenizer("What am I using?", "Using DistilBERT with ONNX Runtime!", return_tensors="pt")
>>> outputs = model(**inputs)

optimum.onnxruntime

The optimum.onnxruntime module supports programmatically exporting a Transformers model. Instantiate a ORTModel for a task and set export=True. Use ~OptimizedModel.save_pretrained to save the ONNX model.

>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> from transformers import AutoTokenizer

>>> model_checkpoint = "distilbert/distilbert-base-uncased-distilled-squad"
>>> save_directory = "onnx/"

>>> ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True)
>>> tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

>>> ort_model.save_pretrained(save_directory)
>>> tokenizer.save_pretrained(save_directory)
< > Update on GitHub