# Exporters

Export any [PreTrainedModel](/docs/transformers/v5.13.0/en/main_classes/model#transformers.PreTrainedModel) to ONNX, ExecuTorch, or a standalone PyTorch program — same model,
same two lines of code, any runtime.

```python
exporter = DynamoExporter()
config = DynamoConfig(dynamic=True)  # or OnnxExporter, ExecutorchExporter
exported = exporter.export(model, inputs, config=config)
```

Because the exporters live inside Transformers, they evolve with the models. Every architecture
change, new attention pattern, or custom cache type is supported at export time from day one —
no waiting for a downstream library to catch up.

The exporters are **experimental**. Many of the patches in this module work around specific
upstream bugs (torch, onnxscript, onnxruntime, executorch) and will be removed as soon as the
fix lands upstream. Until the API stabilises, treat the patches as tied to the versions used in
the test suite — pin those versions in production tooling, and expect both new patches and
removals as we follow upstream.

| Exporter               | Output                     | Runtime                                       |
| ---------------------- | -------------------------- | --------------------------------------------- |
| `DynamoExporter`     | `ExportedProgram`          | Any PyTorch runtime, AOT compilation          |
| `OnnxExporter`       | `ONNXProgram`              | Any ONNX runtime (ORT, TensorRT, OpenVINO, …) |
| `ExecutorchExporter` | `ExecutorchProgramManager` | Mobile and edge devices (ExecuTorch)          |

`AutoHfExporter` picks the right exporter from a config and `AutoExportConfig` picks the right
config class from a dict — the same auto-class idiom the rest of `transformers` uses, useful when
the backend is selected at runtime rather than hard-coded in the call site.

## Installation

```bash
pip install transformers "torch==2.12.0"
```

```bash
pip install transformers "torch==2.12.0" "onnx==1.21.0" "onnxscript==0.7.0" onnxruntime
```

```bash
pip install transformers "torch==2.12.0" "executorch==1.3.1"
```

The versions above are the ones the exporter test suite is pinned against — newer / older releases
often work but the exporter patches target a specific API surface, so for production tooling pin
these and expect `HfExporter` to log a warning when it detects drift.

## Quick start

All exporters share the same interface: create an exporter with a config, call `.export(model, inputs)`.
Switch between runtimes by swapping the exporter class — nothing else changes.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.exporters import DynamoExporter, DynamoConfig

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
inputs = tokenizer("Hello, world!", return_tensors="pt")

exporter = DynamoExporter()
config = DynamoConfig(dynamic=True)
exported = exporter.export(model, inputs, config=config)

# run the exported graph directly
outputs = exported.module()(**inputs)
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.exporters import OnnxExporter, OnnxConfig

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
inputs = tokenizer("Hello, world!", return_tensors="pt")

exporter = OnnxExporter()
config = OnnxConfig(dynamic=True)
onnx_program = exporter.export(model, inputs, config=config)

# save and load with ONNX Runtime
onnx_program.save("model.onnx")

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
ort_inputs = {k: v.numpy() for k, v in inputs.items()}
outputs = session.run(None, ort_inputs)
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.exporters import ExecutorchExporter, ExecutorchConfig

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
inputs = tokenizer("Hello, world!", return_tensors="pt")

exporter = ExecutorchExporter()
config = ExecutorchConfig(backend="xnnpack", dynamic=True)
et_program = exporter.export(model, inputs, config=config)

# save for on-device deployment
et_program.save("model.pte")

# load and run via the ExecuTorch Python runtime
from executorch.runtime import Runtime

program = Runtime.get().load_program("model.pte")
method = program.load_method("forward")
outputs = method.execute(list(inputs.values()))
```

## Dynamic shapes

The quick-start examples above already pass `dynamic=True`, which marks every tensor
dimension as dynamic so the exported graph accepts inputs of any size at runtime without
retracing.

For fine-grained control over which dimensions are dynamic, pass explicit `dynamic_shapes`
instead. This is forwarded directly to `torch.export.export` — see the
[torch.export documentation](https://pytorch.org/docs/stable/export.html) for the expected format.

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.exporters import DynamoExporter, DynamoConfig

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
inputs = tokenizer(["Hello, world!", "Hi"], padding=True, return_tensors="pt")

batch = torch.export.Dim("batch", min=1, max=32)
seq = torch.export.Dim("seq", min=1, max=2048)

exporter = DynamoExporter()
config = DynamoConfig(
    dynamic_shapes={"input_ids": {0: batch, 1: seq}, "attention_mask": {0: batch, 1: seq}},
    # Emit data-dependent shape guards as runtime asserts instead of failing the export when a
    # guard wouldn't hold across the explicit symbolic range — most LLMs need this under fine-grained
    # ``Dim(min=, max=)`` bounds. Not needed with ``dynamic=True`` / ``Dim.AUTO``, where torch.export
    # infers shape relations instead of verifying them against user-stated bounds.
    prefer_deferred_runtime_asserts_over_guards=True,
)
exported = exporter.export(model, inputs, config=config)
```

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.exporters import OnnxExporter, OnnxConfig

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
inputs = tokenizer(["Hello, world!", "Hi"], padding=True, return_tensors="pt")

batch = torch.export.Dim("batch", min=1, max=32)
seq = torch.export.Dim("seq", min=1, max=2048)

exporter = OnnxExporter()
config = OnnxConfig(
    dynamic_shapes={"input_ids": {0: batch, 1: seq}, "attention_mask": {0: batch, 1: seq}},
    # Emit data-dependent shape guards as runtime asserts instead of failing the export when a
    # guard wouldn't hold across the explicit symbolic range — most LLMs need this under fine-grained
    # ``Dim(min=, max=)`` bounds. Not needed with ``dynamic=True`` / ``Dim.AUTO``, where torch.export
    # infers shape relations instead of verifying them against user-stated bounds.
    prefer_deferred_runtime_asserts_over_guards=True,
)
onnx_program = exporter.export(model, inputs, config=config)
```

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.exporters import ExecutorchExporter, ExecutorchConfig

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
inputs = tokenizer(["Hello, world!", "Hi"], padding=True, return_tensors="pt")

batch = torch.export.Dim("batch", min=1, max=32)
seq = torch.export.Dim("seq", min=1, max=2048)

exporter = ExecutorchExporter()
config = ExecutorchConfig(
    backend="xnnpack",
    dynamic_shapes={"input_ids": {0: batch, 1: seq}, "attention_mask": {0: batch, 1: seq}},
    # Emit data-dependent shape guards as runtime asserts instead of failing the export when a
    # guard wouldn't hold across the explicit symbolic range — most LLMs need this under fine-grained
    # ``Dim(min=, max=)`` bounds. Not needed with ``dynamic=True`` / ``Dim.AUTO``, where torch.export
    # infers shape relations instead of verifying them against user-stated bounds.
    prefer_deferred_runtime_asserts_over_guards=True,
)
et_program = exporter.export(model, inputs, config=config)
```

## Generative models

For autoregressive generation, the model's `forward` has different shapes at the prefill step
(full prompt, no KV cache) versus the decode step (single token, populated KV cache). Exporters
expose `~HfExporter.export_for_generation` which splits both stages and exports each.
For multi-modal generative models it additionally splits the prefill into vision/audio encoder,
projector, language model, and `lm_head`. Encoder and language-model discovery uses the canonical
[get_encoder()](/docs/transformers/v5.13.0/en/main_classes/model#transformers.PreTrainedModel.get_encoder) (`modality="image"` / `"audio"`) and
[get_decoder()](/docs/transformers/v5.13.0/en/main_classes/model#transformers.PreTrainedModel.get_decoder) accessors, so any new architecture that wires those up
correctly works out of the box. Projector lookup falls back to a heuristic name list
(`multi_modal_projector`, `connector`, `embed_vision`, `embed_audio`); new architectures
should align their projector attribute to one of these canonical names rather than growing
the list.

```python
from transformers import AutoModelForImageTextToText, AutoProcessor
from transformers.exporters import DynamoExporter, DynamoConfig

model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
messages = [{"role": "user", "content": [{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"}, {"type": "text", "text": "Describe this image."}]}]
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = processor(text=text, images=messages[0]["content"][0]["url"], return_tensors="pt").to(model.device)

exporter = DynamoExporter()
config = DynamoConfig(dynamic=True)
components = exporter.export_for_generation(model, inputs, config=config)
# components = {"image_encoder": ExportedProgram, "language_model": ExportedProgram, "lm_head": ExportedProgram, "decode": ExportedProgram}
```

```python
from transformers import AutoModelForImageTextToText, AutoProcessor
from transformers.exporters import OnnxExporter, OnnxConfig

model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
messages = [{"role": "user", "content": [{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"}, {"type": "text", "text": "Describe this image."}]}]
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = processor(text=text, images=messages[0]["content"][0]["url"], return_tensors="pt").to(model.device)

exporter = OnnxExporter()
config = OnnxConfig(dynamic=True)
components = exporter.export_for_generation(model, inputs, config=config)
# components = {"image_encoder": ONNXProgram, "language_model": ONNXProgram, "lm_head": ONNXProgram, "decode": ONNXProgram}
```

```python
from transformers import AutoModelForImageTextToText, AutoProcessor
from transformers.exporters import ExecutorchExporter, ExecutorchConfig

model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
messages = [{"role": "user", "content": [{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"}, {"type": "text", "text": "Describe this image."}]}]
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = processor(text=text, images=messages[0]["content"][0]["url"], return_tensors="pt").to(model.device)

exporter = ExecutorchExporter()
config = ExecutorchConfig(backend="xnnpack", dynamic=True)
components = exporter.export_for_generation(model, inputs, config=config)
# components = {"image_encoder": ExecutorchProgramManager, "language_model": ..., "lm_head": ..., "decode": ...}
```

The exported components are **independent graphs**, not a turnkey inference pipeline.
The caller is responsible for running each encoder, projecting embeddings, and orchestrating
the generation loop. We are actively working to reduce the glue required between components.

What export_for_generation does under the hood

[decompose_for_generation()](/docs/transformers/v5.13.0/en/exporters#transformers.exporters.utils.decompose_for_generation) runs `model.generate(**inputs, max_new_tokens=2)`
once and hooks `model.forward` to capture the real prefill and decode kwargs (and the
per-submodule kwargs via hooks on each encoder / projector / language model if the model is
multi-modal). That's why it works for any architecture — decoder-only, SSM, encoder-decoder,
multi-modal — without per-model glue. `export_for_generation` is a one-liner over it.

The capture runs the model eagerly on `inputs`, so pass **small but representative** values —
one short prompt, a single small image, a few audio frames. The exported program isn't tied
to those sizes (dynamic shapes still flow through), but smaller capture inputs make
`decompose_for_generation` cheaper and keep symbolic-shape inference tractable.

Call `decompose_for_generation` directly when you want to do something between decomposing
and exporting — run an eager forward for verification, swap a submodule's inputs, skip a stage:

```python
from transformers.exporters.utils import decompose_for_generation

components = decompose_for_generation(model, inputs)
# {"image_encoder": (submodel, fwd_kwargs), "language_model": (...), ..., "decode": (...)}

exported = {}
for name, (submodel, subinputs) in components.items():
    eager_outputs = submodel(**subinputs)
    exported[name] = exporter.export(submodel,subinputs, config=config)
```

## Limitations and workarounds

`torch.export`, `torch.onnx.export`, and ExecuTorch each have rough edges around specific
PyTorch patterns. The exporters work around these with a small set of reversible patches
and FX-level fixes applied at well-defined points in the export flow. None of this is
visible from the public `export()` API, but the most common things to know:

- Flash-attention and flex-attention are not exportable on any backend; `sdpa` is the preferred
  setting and `eager` also works (slower). Set one of them on the model before calling `export()`
  if it's using something else.
- `grouped_mm` traces fine through `DynamoExporter` and is auto-translated for `OnnxExporter`;
  for `ExecutorchExporter` with the XNNPACK backend, the exporter swaps MoE experts to
  `batched_mm` because XNNPACK has no `_grouped_mm.out` kernel.
- A short list of models (`EXPORT_SKIP_MODEL_CLASSES`) is skipped from the export sweep when
  the model itself is fundamentally non-exportable; each entry carries a TODO with the
  model-side change needed.

Export pipeline — internals (per-backend stages and how to extend)

Each exporter's source file labels its stages as `# ── Stage N: … ─────` blocks; the
tables below mirror that layout 1:1, so the file you read and the doc you read are the
same map.

Two lifecycles are used consistently:

- **Patches** (registered via `@register_patch(backend, *dotted_paths)`, installed via
  `apply_patches(backend)`) reversibly swap an attribute (a `torch` op, an ExecuTorch
  internal, a model class method) for the duration of the export. Pass multiple paths
  to a single decorator to share the same factory across targets — useful when the
  same method shape needs to be patched on several classes (e.g. `_update_mamba_mask`
  on Jamba/Bamba/…). Originals are restored on exit, even if the body raises.
- **Fixes** (registered via `@register_fx_node_fix(backend)` /
  `@register_fx_program_fix(backend)`, applied via `apply_fx_node_fixes(backend, gm)` /
  `apply_fx_program_fixes(backend, ep)`; ONNX-IR fixes still listed in `_IR_FIXES` and
  applied via `apply_onnx_ir_fixes`) mutate the in-progress graph or program in place.
  There's no revert — they're meant to permanently repair the artifact before the next
  pipeline step.

Every patch / fix sits in a backend-keyed registry (`_PATCHES`, `_FX_NODE_FIXES`,
`_FX_PROGRAM_FIXES` in [exporters/utils.py](https://github.com/huggingface/transformers/blob/main/src/transformers/exporters/utils.py)).
Adding a new one is *write a function and decorate it* — nothing else.

### `DynamoExporter`

The base exporter has one patch stage and four structural helpers. They run in this order
inside `DynamoExporter.export`, against the original `nn.Module`:

| #     | Stage                                                                | Section in [exporter_dynamo.py](https://github.com/huggingface/transformers/blob/main/src/transformers/exporters/exporter_dynamo.py) | What it does                                                                                                                                                                                                                                                                                                | How to extend                                                                                                                                                                                                                                                                |
| ----- | -------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **1** | **Forward-signature patch** (`patch_forward_signature`)              | `# ── Stage 1: Model signature patch ──`                                                                                             | Replaces `model.forward` with an explicit flat-arg signature derived from the inputs dict, so `torch.export` doesn't bundle `**kwargs` into a single tuple. This is the entry contract `torch.export` reads before tracing.                                                                                | Internal — no extension knob.                                                                                                                                                                                                                                                |
| **2** | **Model patches** (`_PATCHES["dynamo"]` via `apply_patches("dynamo")`) | `# ── Stage 2: Model patches ──`                                                                                                     | Reversible class-attribute swaps applied during tracing. Each `_patch_*(original) → replacement` factory targets one or more `Class.method` paths and replaces a non-exportable model pattern (data-dependent loops, in-place ops, mask checks, chunked-attention `split → zip → cat`) with an export-safe equivalent. | Define `_patch_*(original)` and decorate with `@register_patch("dynamo", *dotted_paths)`. Pass multiple paths to share the same factory across classes (e.g. `_update_mamba_mask` on Jamba/Bamba/…). Examples: mamba/linear-attn mask, NLLB classifier cast, chunked-vision attention. |
| **3** | **Pytree registration** (`register_cache_pytrees_for_model`)         | `# ── Stage 3: Pytree registration ──`                                                                                               | Registers flatten/unflatten via `torch.utils._pytree.register_pytree_node` for every captured `Cache` / `ModelOutput`. Reflection-driven, tuned for tensor containers (not a general serialiser).                                                                                                          | Usually automatic. If a type isn't reflectable, add a branch to `_flatten_to_context` / `_unflatten_from_context`.                                                                                                                                                            |
| **4** | **Dynamic shapes** (`get_auto_dynamic_shapes`)                       | `# ── Stage 4: Dynamic shapes ──`                                                                                                    | Auto-assigns `Dim.AUTO` to every tensor and cache leaf when `DynamoConfig.dynamic=True` and the user did not pass `dynamic_shapes` explicitly.                                                                                                                                                              | Override per-export via `DynamoConfig.dynamic_shapes`.                                                                                                                                                                                                                       |
| **5** | **State cleanup** (`reset_model_state` / `_STATEFUL_CACHE_ATTRS`)    | `# ── Stage 5: Model state cleanup ──`                                                                                               | Resets non-`Cache` tensor attributes set inside `forward` (e.g. glm_moe_dsa `_cached_keys`, wav2vec2_bert `cached_rotary_positional_embedding`) that `torch.export` leaves as FakeTensors, so a follow-up eager forward is safe.                                                                            | Append the attribute name to `_STATEFUL_CACHE_ATTRS`.                                                                                                                                                                                                                        |

### `OnnxExporter`

`OnnxExporter` extends `DynamoExporter` with five numbered stages applied around
`torch.onnx.export`. The labels match the `# ── Stage N: … ──` headers in the source:

| #     | Stage                                                       | Section in [exporter_onnx.py](https://github.com/huggingface/transformers/blob/main/src/transformers/exporters/exporter_onnx.py) | When it runs                                     | Lifecycle                            | What it does                                                                                                                                                                                                                                                                | How to extend                                                                                |
| ----- | ----------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ | ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| **1** | **Torch patches** (`_PATCHES["onnx"]`)                      | `# ── Stage 1: Torch patches ──`                                                                                                 | During `torch.export` / `torch.onnx.export`      | Reversible (`apply_patches("onnx")`) | Reversible swaps of `torch` ops (`where`, `unsqueeze`, `scaled_dot_product_attention`, `searchsorted`, …) that the ONNX decomposer can't lower as-is. Each `_patch_*(original)` closes over the original.                                                                   | Define `_patch_*(original)` and decorate with `@register_patch("onnx", "dotted.path")`.       |
| **2** | **ONNX patches** (`_PATCHES["onnx"]`)                       | `# ── Stage 2: ONNX patches ──`                                                                                                  | During `torch.onnx.export`                       | Reversible (`apply_patches("onnx")`) | Hooks the private `_prepare_exported_program_for_export` step so the FX node fixes (stage 3) run again right after `run_decompositions` — any new symbolic-guard nodes the ONNX decomposition introduces get repaired before the FX → ONNX lowering picks them up.          | Same registry as stage 1 — define `_patch_*(original)` and decorate with `@register_patch("onnx", "dotted.path")`. |
| **3** | **FX node fixes** (`_FX_NODE_FIXES["onnx"]`)                | `# ── Stage 3: FX node fixes ──`                                                                                                 | After `torch.export`, again after `run_decompositions` | In-place (`apply_fx_node_fixes("onnx", gm)`) | Per-node rewrites on the `GraphModule` to drop or replace nodes the ONNX decomposer can't lower (alias ops, in-place views, `_assert_*`, dead comparisons, in-place `triu_`, `fill_diagonal_`, `sort(stable=True)`). DCE runs automatically at the end of the walk.   | Define `_fix_*(gm, node) → bool` (return `True` to consume) and decorate with `@register_fx_node_fix("onnx")`. |
| **4** | **ONNX translations** (`_ONNX_TRANSLATION_TABLE`)           | `# ── Stage 4: ONNX translations ──`                                                                                             | During FX → ONNX lowering                        | n/a (translation table)              | Overrides `torchlib`'s default lowering for specific aten ops where the default is buggy or missing. Currently `aten.index_put` (bool-mask path), `aten.bincount` (`OneHot + ReduceSum`), and `aten._grouped_mm` / `transformers.grouped_mm_fallback` (MoE grouped-matmul → unrolled `Slice + MatMul + Concat`). | Implement an `_aten_*` onnxscript function and add it to `_ONNX_TRANSLATION_TABLE`.          |
| **5** | **ONNX IR fixes** (`_IR_FIXES` / `apply_onnx_ir_fixes`)     | `# ── Stage 5: ONNX IR fixes ──`                                                                                                 | After `torch.onnx.export` returns                | In-place (`apply_onnx_ir_fixes`)     | Post-export rewrites on the `ONNXProgram` IR to work around ORT validation/runtime bugs (e.g. forcing `TopK(sorted=True)`). Applied to both the top-level graph and every function.                                                                                         | Implement `_fix_ir_*(graph_like)` and append to `_IR_FIXES`.                                 |

A complete inventory of patches in the file is one grep away:

```bash
grep -nE "^def (_patch_|_fix_|_aten_)" src/transformers/exporters/exporter_onnx.py
```

### `ExecutorchExporter`

`ExecutorchExporter` extends `DynamoExporter` with four numbered stages applied around
`to_edge_transform_and_lower` and `to_executorch`:

| #     | Stage                                                              | Section in [exporter_executorch.py](https://github.com/huggingface/transformers/blob/main/src/transformers/exporters/exporter_executorch.py) | When it runs                                            | Lifecycle                            | What it does                                                                                                                                                                                                                                                                                              | How to extend                                                                                                                |
| ----- | ------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| **1** | **Backend preparation** (`_BACKEND_PREPARE`)                       | `# ── Stage 1: Backend preparation ──`                                                                                                       | Before `torch.export`                                   | n/a (one-shot)                       | `prepare_for_xnnpack` moves the model to CPU/fp32 and selects `XnnpackPartitioner`; `prepare_for_cuda` moves to CUDA/bf16 and selects `CudaPartitioner`. Returns `(model, sample_inputs, partitioner)`.                                                                                                   | Implement `prepare_for_<name>` and register it in `_BACKEND_PREPARE`.                                                        |
| **2** | **Torch patches** (`_PATCHES["executorch"]`)                       | `# ── Stage 2: Torch patches ──`                                                                                                             | During `torch.export` tracing                           | Reversible (`apply_patches("executorch")`) | Replaces `torch` ops the ExecuTorch backends can't accept (`split_copy`, `chunk`, `topk(k>dim)`, non-divisible `avg_pool2d`, `dropout`, in-place `view`, GQA-shaped SDPA) with decomposed equivalents.                                                                                              | Define `_patch_*(original)` and decorate with `@register_patch("executorch", "dotted.path")`.                                |
| **3** | **ExecuTorch patches** (`_PATCHES["executorch"]`)                  | `# ── Stage 3: ExecuTorch patches ──`                                                                                                        | During `to_edge_transform_and_lower` / `to_executorch`  | Reversible (`apply_patches("executorch")`) | Reversibly swaps ExecuTorch internals that crash on legitimate dynamic-shape patterns: `SpecPropPass.update_placeholder_tensor_specs`, `PruneEmptyTensorsPass.remove_empty_tensors_from_cat`, `eval_upper_bound`, `dim_order_from_stride` (rebound on every importer), XNNPACK squeeze/unsqueeze define-node, complex-dtype validator, edge-dialect sym-op allowlist. | Same registry as stage 2 — define `_patch_*(original)` and decorate with `@register_patch("executorch", "dotted.path")`.     |
| **4** | **FX program fixes** (`_FX_PROGRAM_FIXES["executorch"]`)           | `# ── Stage 4: FX program fixes ──`                                                                                                          | After `torch.export`, before `to_edge_transform_and_lower` | In-place (`apply_fx_program_fixes("executorch", ep)`) | Repair the `ExportedProgram` where the fix needs program-level context: widen `int_oo` upper bounds in `range_constraints`, fill missing placeholder `meta["val"]` from `state_dict`.                                                                                                          | Define `_fix_*(exported_program) → None` and decorate with `@register_fx_program_fix("executorch")`. |
| **5** | **FX node fixes** (`_FX_NODE_FIXES["executorch"]`)                 | `# ── Stage 5: FX node fixes ──`                                                                                                             | After stage 4, before `to_edge_transform_and_lower`     | In-place (`apply_fx_node_fixes("executorch", gm)`) | Per-node rewrites: swap Python sym ops for `executorch_prim.*` equivalents, rewrite `pow` as `mul` chain, normalize amax/max negative dim, force contiguous clone. DCE runs automatically at the end of the walk.                                                                                | Define `_fix_*(gm, node) → bool` (return `True` to consume) and decorate with `@register_fx_node_fix("executorch")`. |

### When to patch the exporter vs. fix the model

The split is intentional:

- **Modeling change** if the pattern blocks export across multiple backends — data-dependent
  loops, stateful caches outside `Cache`, hand-written split-loop attention. Fix it once in
  the model and every exporter benefits.
- **Exporter patch** if the issue is a single backend's lowering bug — a missing ONNX
  translation, an ORT validation quirk, an FX decomposition that emits a dead op. Keep the
  workaround in the exporter and the modeling code stays clean.

### Known upstream workarounds

A small number of model classes hit confirmed bugs in `onnxscript`'s graph optimizer
(constant folding crashing on `SplitToSequence`, FPN initialisers being dropped). For those,
ONNX optimisation is selectively disabled via
[`ONNX_DISABLE_OPTIMIZE_MODEL_CLASSES`](https://github.com/huggingface/transformers/blob/main/tests/exporters/test_utils.py)
in the test suite — each entry is annotated with the upstream issue it works around. This
list is **expected to shrink** as upstream bugs land; it is not an extension point for
arbitrary skipping, and new entries should reference a specific upstream bug.

A second list, [`EXPORT_SKIP_MODEL_CLASSES`](https://github.com/huggingface/transformers/blob/main/tests/exporters/test_utils.py),
opts a handful of model classes out of the entire export sweep when the model itself is
fundamentally non-exportable as-is (data-dependent control flow that can't be vectorised,
modules treated as forward arguments, …). Same expectations: every entry carries a TODO
naming the underlying model change needed; the list should shrink, not grow.

## API reference

### Exporter classes[[transformers.exporters.DynamoExporter]]

#### transformers.exporters.DynamoExporter[[transformers.exporters.DynamoExporter]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/exporter_dynamo.py#L68)

Exporter that converts a [PreTrainedModel](/docs/transformers/v5.13.0/en/main_classes/model#transformers.PreTrainedModel) to an `ExportedProgram`.

Example:

```python
>>> from transformers.exporters.exporter_dynamo import DynamoExporter, DynamoConfig

>>> exporter = DynamoExporter()
>>> exported = exporter.export(model, inputs, config=DynamoConfig(dynamic=True))
>>> outputs = exported.module()(**inputs)
```

exporttransformers.exporters.DynamoExporter.exporthttps://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/exporter_dynamo.py#L85[{"name": "model", "val": ": PreTrainedModel"}, {"name": "sample_inputs", "val": ": MutableMapping[str, Any]"}, {"name": "config", "val": ": DynamoConfig | dict[str, Any]"}]

#### transformers.exporters.OnnxExporter[[transformers.exporters.OnnxExporter]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/exporter_onnx.py#L87)

Exporter that converts a [PreTrainedModel](/docs/transformers/v5.13.0/en/main_classes/model#transformers.PreTrainedModel) to an ONNX `ONNXProgram`.

Example:

```python
>>> from transformers.exporters.exporter_onnx import OnnxExporter, OnnxConfig

>>> exporter = OnnxExporter()
>>> onnx_program = exporter.export(model, inputs, config=OnnxConfig(dynamic=True))
>>> outputs = onnx_program(**inputs)  # run in-memory
>>> exporter.export(model, inputs, config=OnnxConfig(output_path="model.onnx"))  # save to disk
```

exporttransformers.exporters.OnnxExporter.exporthttps://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/exporter_onnx.py#L105[{"name": "model", "val": ": PreTrainedModel"}, {"name": "sample_inputs", "val": ": MutableMapping[str, Any]"}, {"name": "config", "val": ": OnnxConfig | dict[str, Any]"}]

#### transformers.exporters.ExecutorchExporter[[transformers.exporters.ExecutorchExporter]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/exporter_executorch.py#L84)

Exporter that converts a [PreTrainedModel](/docs/transformers/v5.13.0/en/main_classes/model#transformers.PreTrainedModel) to an ExecuTorch `ExecutorchProgramManager`.

Example:

```python
>>> from transformers.exporters.exporter_executorch import ExecutorchExporter, ExecutorchConfig

>>> exporter = ExecutorchExporter()
>>> et_program = exporter.export(model, inputs, config=ExecutorchConfig(backend="xnnpack"))
>>> et_program.write_to_file("model.pte")
```

exporttransformers.exporters.ExecutorchExporter.exporthttps://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/exporter_executorch.py#L101[{"name": "model", "val": ": PreTrainedModel"}, {"name": "sample_inputs", "val": ": MutableMapping[str, Any]"}, {"name": "config", "val": ": ExecutorchConfig | dict[str, Any]"}]
Export a model to ExecuTorch, applying backend preparation and torch op patches.

### Configuration[[transformers.exporters.DynamoConfig]]

#### transformers.exporters.DynamoConfig[[transformers.exporters.DynamoConfig]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/configs.py#L76)

Configuration class for exporting models via *torch.export*.

**Parameters:**

dynamic (*bool*, *optional*, defaults to *False*) : Whether to export with dynamic (symbolic) shapes. When *True* and *dynamic_shapes* is not set, all tensor dimensions are set to *Dim.AUTO* automatically.

strict (*bool*, *optional*, defaults to *False*) : Whether to enable strict mode in *torch.export*. Runs the full symbolic trace and catches more errors, but is slower and more likely to fail on complex models.

dynamic_shapes (*dict[str, Any]*, *optional*) : Explicit per-input dynamic shape specifications passed to *torch.export*. Takes precedence over *dynamic*.

prefer_deferred_runtime_asserts_over_guards (*bool*, *optional*, defaults to *False*) : When *True*, data-dependent shape guards are emitted as runtime asserts in the exported graph instead of failing the export at trace time when a guard wouldn't hold across the full symbolic shape range. Most transformer LLMs need this set to *True* when using fine-grained `Dim(min=, max=)` bounds. Not needed with `dynamic=True` / `Dim.AUTO`, where `torch.export` infers shape relations instead of verifying them against the user-stated bounds.

#### transformers.exporters.OnnxConfig[[transformers.exporters.OnnxConfig]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/configs.py#L110)

Configuration class for exporting models to ONNX via `torch.onnx.export`.

Inherits all fields from `DynamoConfig` (`dynamic`, `strict`,
`dynamic_shapes`, `prefer_deferred_runtime_asserts_over_guards`).

**Parameters:**

output_path (`str` or `PathLike`, *optional*) : Output path for the `.onnx` file. When `None` (default) the exported model is kept in memory as an `ONNXProgram` and not written to disk.

opset_version (`int`, *optional*) : ONNX opset version to target. Defaults to the latest opset supported by the installed `onnxscript` version.

external_data (`bool`, *optional*, defaults to `True`) : Store large weight tensors in a separate `.onnx_data` sidecar file instead of embedding them in the protobuf. Required for models whose weights exceed the 2 GB protobuf limit.

optimize (`bool`, *optional*, defaults to `True`) : Run `onnxscript` optimisation passes (constant folding, dead-code elimination, …) on the exported graph. Disable for models that hit upstream `onnxscript` optimiser bugs.

export_params (`bool`, *optional*, defaults to `True`) : Embed model weights in the ONNX graph. Set to `False` to export a weight-free graph (weights must be supplied at runtime).

keep_initializers_as_inputs (`bool`, *optional*, defaults to `False`) : Expose weight initializers as explicit graph inputs. Required by some older ONNX runtimes (opset < 9).

#### transformers.exporters.ExecutorchConfig[[transformers.exporters.ExecutorchConfig]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/configs.py#L153)

Configuration class for exporting models to ExecuTorch format.

Inherits all fields from `DynamoConfig` (`dynamic`, `strict`,
`dynamic_shapes`, `prefer_deferred_runtime_asserts_over_guards`).

**Parameters:**

backend (`str`, *optional*, defaults to `"xnnpack"`) : Target ExecuTorch backend. Supported values:  - `"xnnpack"` — CPU inference via the XNNPACK library (default; runs anywhere). - `"cuda"` — GPU inference via the ExecuTorch CUDA backend.

### Utilities[[transformers.exporters.utils.get_leaf_tensors]]

#### transformers.exporters.utils.get_leaf_tensors[[transformers.exporters.utils.get_leaf_tensors]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/utils.py#L276)

Recursively retrieve all leaf tensors from a potentially nested structure.

**Parameters:**

obj (`Any`) : A tensor, dataclass, dict, list, tuple, or any nesting thereof.

**Returns:**

``dict[str, torch.Tensor]``

Flat mapping from dotted path strings to tensors.

#### transformers.exporters.utils.prepare_for_export[[transformers.exporters.utils.prepare_for_export]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/utils.py#L343)

Configure model and inputs for export. Mutates both `model` and `inputs` in place,
returning `(model, inputs, output_flags)` where `output_flags` holds the values popped
from `inputs` for `use_cache`, `return_dict`, etc. (to be applied reversibly onto
`model.config` by `patch_model_config` during the trace).

- Strips label inputs (`labels`, `future_values`) — loss computation is unsupported.
- Pops output flags (`use_cache`, `return_dict`, …) from `inputs` so they don't appear
  as traced kwargs; the values are returned for the trace block to apply onto
  `model.config`.
- Pre-computes data-dependent vision/audio kwargs registered via
  `@register_export_input_preparer` and writes them into `inputs`.
- Casts input tensors to match the model's `dtype` / `device`.

#### transformers.exporters.utils.decompose_prefill_decode[[transformers.exporters.utils.decompose_prefill_decode]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/utils.py#L607)

Run `model.generate()` for 2 tokens and capture prefill and decode inputs.

Reuses the full generation machinery so every architecture (decoder-only, SSM,
encoder-decoder, multi-modal, …) gets correct inputs without reimplementing the loop.

**Returns:**

``dict[str, tuple[torch.nn.Module, dict]]``

`{"prefill": (model, prefill_inputs), "decode": (model, decode_inputs)}`

#### transformers.exporters.utils.decompose_multimodal[[transformers.exporters.utils.decompose_multimodal]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/utils.py#L692)

Capture inputs to each multi-modal submodule via a single forward pass.

Detects all known multi-modal submodules by attribute name (vision tower, projector,
language model, lm_head, …) and captures their forward kwargs during one
`model(**inputs)` call.

Each submodule is returned as a separate `name: (module, inputs)` entry for
independent export. The token-merge step (e.g. `masked_scatter` for multi-modal models)
is intentionally left outside the exported graphs — it is the caller's responsibility
to assemble `inputs_embeds` from the encoder outputs before running the decoder.

**Returns:**

``dict[str, tuple[torch.nn.Module, dict]]``

One `name: (module, inputs)`
entry per detected submodule (image/audio encoder, projector, language model, lm_head).

#### transformers.exporters.utils.decompose_for_generation[[transformers.exporters.utils.decompose_for_generation]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/utils.py#L736)

Decompose a generative model into independently exportable `(model, forward_inputs)` pairs.

Runs `decompose_prefill_decode` to capture prefill and decode forward kwargs from a real
`model.generate(**inputs, max_new_tokens=2)`. If the prefill is multi-modal (per `is_multimodal`),
further splits it into one entry per submodule (vision/audio encoder, projector, language model,
`lm_head`) via `decompose_multimodal`.

**Parameters:**

model : Generative model. Must support `model.generate(**inputs)`.

inputs : **Generate** kwargs — what you'd pass to `model.generate(**inputs)`.

**Returns:**

``{component_name`

(submodel, forward_inputs)}`. Keys are `"prefill"` / `"decode"` for
plain generative models and `"<modality>_encoder"` / `"multi_modal_projector"` /
`"language_model"` / `"lm_head"` / `"decode"` for multi-modal generative models.

#### transformers.exporters.utils.is_multimodal[[transformers.exporters.utils.is_multimodal]]

[Source](https://github.com/huggingface/transformers/blob/v5.13.0/src/transformers/exporters/utils.py#L687)

Returns `True` if the model is multi-modal with modal encoders and a language model.

