Quickstart

Export

You can export your models to ONNX easily:

optimum-cli export onnx --model meta-llama/Llama-3.2-1B --output_dir meta_llama3_2_1b_onnx

Inference

To load a model and run inference, you can just replace your AutoModelForCausalLM class with the corresponding ORTModelForCausalLM class. You can also load a PyTorch checkpoint and convert it to ONNX on-the-fly when loading your model.

- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM
  from transformers import AutoTokenizer

  model_id = "meta-llama/Llama-3.2-1B"
  tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForCausalLM.from_pretrained(model_id)
+ model = ORTModelForCausalLM.from_pretrained(model_id)

optimum-onnx

Quickstart

Export

Inference