optimum-onnx documentation
Quickstart
Quickstart
Export
You can export your models to ONNX easily:
optimum-cli export onnx --model meta-llama/Llama-3.2-1B --output_dir meta_llama3_2_1b_onnx
Inference
To load a model and run inference, you can just replace your AutoModelForCausalLM
class with the corresponding ORTModelForCausalLM
class. You can also load a PyTorch checkpoint and convert it to ONNX on-the-fly when loading your model.
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer
model_id = "meta-llama/Llama-3.2-1B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForCausalLM.from_pretrained(model_id)
+ model = ORTModelForCausalLM.from_pretrained(model_id)