optimum-onnx documentation

Quickstart

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Quickstart

Export

You can export your models to ONNX easily:

optimum-cli export onnx --model meta-llama/Llama-3.2-1B --output_dir meta_llama3_2_1b_onnx

Inference

To load a model and run inference, you can just replace your AutoModelForCausalLM class with the corresponding ORTModelForCausalLM class. You can also load a PyTorch checkpoint and convert it to ONNX on-the-fly when loading your model.

- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM
  from transformers import AutoTokenizer

  model_id = "meta-llama/Llama-3.2-1B"
  tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForCausalLM.from_pretrained(model_id)
+ model = ORTModelForCausalLM.from_pretrained(model_id)