ExecuTorch documentation
Export a model to ExecuTorch with optimum.exporters.executorch
Export a model to ExecuTorch with optimum.exporters.executorch
If you need to deploy 🤗 Transformers models for on-device use cases, we recommend exporting them to a serialized format that can be distributed and executed on specialized runtimes and hardware. In this guide, we’ll show you how to export these models to ExecuTorch.
Why ExecuTorch?
ExecuTorch is the ideal solution for deploying PyTorch models on edge devices, offering a streamlined process from export to deployment without leaving PyTorch ecosystem.
Supporting on-device AI presents unique challenges with diverse hardware, critical power requirements, low/no internet connectivity, and realtime processing needs. These constraints have historically prevented or slowed down the creation of scalable and performant on-device AI solutions. We designed ExecuTorch, backed by our industry partners like Meta, Arm, Apple, Qualcomm, MediaTek, etc. to be highly portable and provide superior developer productivity without losing on performance.
Summary
Exporting a PyTorch model to ExecuTorch is as simple as
optimum-cli export executorch \
--model HuggingFaceTB/SmolLM2-135M \
--task text-generation \
--recipe xnnpack \
--output_dir hf_smollm2 \
--use_custom_sdpa
Check out the help for more options:
optimum-cli export executorch --help
Exporting a model to ExecuTorch using the CLI
The Optimum ExecuTorch export can be used through Optimum command-line:
optimum-cli export executorch --help
usage: optimum-cli export executorch [-h] -m MODEL [-o OUTPUT_DIR] [--task TASK] [--recipe RECIPE]
options:
-h, --help show this help message and exit
Required arguments:
-m MODEL, --model MODEL
Model ID on huggingface.co or path on disk to load model from.
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Path indicating the directory where to store the generated ExecuTorch model.
--task TASK The task to export the model for. Available tasks depend on the model, but are among: ['audio-classification', 'feature-extraction', 'image-to-text',
'sentence-similarity', 'depth-estimation', 'image-segmentation', 'audio-frame-classification', 'masked-im', 'semantic-segmentation', 'text-classification',
'audio-xvector', 'mask-generation', 'question-answering', 'text-to-audio', 'automatic-speech-recognition', 'image-to-image', 'multiple-choice', 'image-
classification', 'text2text-generation', 'token-classification', 'object-detection', 'zero-shot-object-detection', 'zero-shot-image-classification', 'text-
generation', 'fill-mask'].
--recipe RECIPE Pre-defined recipes for export to ExecuTorch. Defaults to "xnnpack".
--use_custom_sdpa For decoder-only models to use custom sdpa with static kv cache to boost performance. Defaults to False.
You should see a model.pte
file is stored under “./hf_smollm2/“:
hf_smollm2/ └── model.pte
This will fetch the model on the Hub and exports the PyTorch model with the specialized recipe. The resulting model.pte
file can then be run on the XNNPACK backend, or on many
other ExecuTorh supported backends if exports with different recipes, e.g. Apple’s Core ML or MPS, Qualcomm’s SoCs, ARM’s Ethos-U, Xtensa HiFi4 DSP, Vulkan GPU, MediaTek, etc.
For example, we can load and run the model with ExecuTorch Runtime using the optimum.executorch
package as follows:
from transformers import AutoTokenizer
from optimum.executorch import ExecuTorchModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M")
model = ExecuTorchModelForCausalLM.from_pretrained("hf_smollm2/")
prompt = "Simply put, the theory of relativity states that"
print(f"\nGenerated texts:\n\t{model.text_generation(tokenizer=tokenizer, prompt=prompt, max_seq_len=45)}")
As you can see, converting a model to ExecuTorch does not mean leaving the Hugging Face ecosystem. You end up with a similar API as regular 🤗 Transformers models!
In case your model wasn’t already exported to ExecuTorch, it can also be converted on-the-fly when loading your model:
from optimum.executorch import ExecuTorchModelForCausalLM
model = ExecuTorchModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-135M", recipe="xnnpack", attn_implementation="custom_sdpa")