You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

ONNX Runtime load-time CPU DoS with nested `If` graphs

This repo contains a small ONNX test case for a load-time CPU denial of service in ONNX Runtime. These files are not useful ML models. Do not put them in an auto-loading or production model pipeline.

Short version

poc_deep_d20.onnx is a 3.6 KB ONNX file that:

passes onnx.checker.check_model(path, full_check=True);
is accepted by onnxruntime.InferenceSession(path);
spends about 56 seconds building the session on onnxruntime==1.26.0 / onnx==1.21.0;
prints no warning or error while it is doing the work.

The matching flat control file loads in about 4 ms on the same machine. That makes the deep file about 14,000 times slower to load, even though it is still only a few kilobytes.

The problem is not graph optimization. Loading with ORT_DISABLE_ALL takes about the same amount of time as ORT_ENABLE_ALL. Increasing intra_op_num_threads and inter_op_num_threads also does not help. The slow path is graph type/shape inference during session creation.

Files in this repo

File	Size	SHA-256
`poc_deep_d18.onnx`	3287 B	`f3bd63087ea131ccdec2d9d9051f5232f8a1bbe5f562b61b1cbefd77a308536b`
`poc_deep_d20.onnx`	3613 B	`f351044c5f54c783b749b67b78dddb049fd5ef5eb917bf0b0dbbf588dcdb5d85`
`poc_flat_control_d20.onnx`	1471 B	`905d8b189f5ac3a629dc72dd2f81fb6c1c8b33c45d3ebd8ffd28733e9ac48324`
`build_poc.py`		rebuilds the ONNX files from a depth value
`verify_poc.py`		checks the models and times session creation
`requirements.txt`		versions used for the timings below

poc_deep_d18.onnx is included as a quicker check. It shows the same bug class but takes less time to load. poc_flat_control_d20.onnx has no deep nested control flow and is there as the baseline.

Reproduce

Use a throwaway virtual environment:

pip install -r requirements.txt
python verify_poc.py

Expected shape of the output on a current x86_64 machine:

[poc_deep_d20.onnx]
  size                  : 3613 bytes
  onnx.checker          : ACCEPTED
  ORT load (ENABLE_ALL) : 56.12 s
  ORT load (DISABLE_ALL): 59.68 s

[poc_flat_control_d20.onnx]
  size                  : 1471 bytes
  onnx.checker          : ACCEPTED
  ORT load (ENABLE_ALL) : 0.004 s
  ORT load (DISABLE_ALL): 0.004 s

amplification (deep load / flat load, ENABLE_ALL):
       poc_deep_d20.onnx  ~14099x  (56.12s vs 0.004s)

Exact times depend on single-core CPU speed. The important part is the gap between the deep file and the flat control.

To rebuild the files:

python build_poc.py --depth 20 --out poc_deep_d20.onnx
python build_poc.py --depth 20 --flat-control --out poc_flat_control_d20.onnx

Depth 22 is also useful if you want to see the scaling:

python build_poc.py --depth 22 --out poc_deep_d22.onnx

On my test box depth 22 produced a 3.9 KB file and took about 4.4 minutes to load.

What the model does

The model defines one local function, local:DeepF.

The function body contains one If node. Its then_branch contains one more If. That branch contains one more If, and so on for the selected depth. Each else_branch is just a one-node Identity graph.

There is no recursive function call in the file. That matters because the usual function-cycle check accepts it. onnx.checker accepts it as well.

The cost appears when ONNX Runtime resolves the graph and infers types for the nested subgraphs. Each extra level adds only a small amount of file size, but the load time roughly doubles.

Measurements

Measured with onnxruntime==1.26.0, onnx==1.21.0, Python 3.12, CPU execution provider.

depth	file size	load time	flat control	ratio
16	2961 B	~4.0 s	~0.004 s	~1010x
18	3287 B	~16.2 s	~0.004 s	~4061x
20	3613 B	~56.1 s	~0.004 s	~14099x
22	3939 B	~262.3 s	~0.004 s	~65500x
30	5243 B	stopped after 350 s	~0.004 s	lower bound only

The depth-30 run had not finished when I stopped it. Based on the depth 20 to 22 jump, it is in the many-hours-per-load range.

There is a parser ceiling eventually. Very deep models hit protobuf's upb_DecodeOptions_MaxDepth somewhere between depth 30 and depth 50. The bug is not literally unbounded, but the depth that still parses is already enough for minutes to hours of CPU per load.

Threading check

I also checked whether more ORT threads make session creation faster. They do not.

`intra_op_num_threads`	`inter_op_num_threads`	depth-20 load
1	1	66.4 s
4	4	65.5 s
8	8	66.3 s

The thread pools are for compute work during sess.run(). This issue is in session construction, before inference starts, and it runs on the calling thread.

Impact

Any service that accepts an ONNX file and calls onnxruntime.InferenceSession(path_or_bytes) before applying a time limit is exposed.

The load happens before sess.run(), so a service can spend the CPU time even if it never runs inference. Retrying the same file repeats the cost. Multiple worker processes can be pinned at the same time by submitting multiple small files.

This is a denial-of-service issue. I did not see memory corruption, code execution, or a crash. The runtime eventually returns a session; it just takes a long time.

Fix idea

The slow cycle is:

Graph::Resolve
  -> Graph::VerifyNodeAndOpMatch
       -> Graph::PerformTypeAndShapeInferencing
            -> Graph::InferAndVerifySubgraphTypes
                 -> GraphInferencerImpl::doInferencing
                      -> onnx::IfInferenceFunction
                           -> nested If subgraph

The direct fix is to cap subgraph nesting depth, or cap total inference work inside one Graph::Resolve call. The cap needs to be below 20 to stop this test case in practice. A limit in the 8 to 16 range keeps the measured load time in seconds and should still be well above normal ONNX control-flow models.

Adding the same depth check to onnx.checker would also help, because then services could reject the file before it reaches ONNX Runtime.

Versions tested

Component	Version
`onnx`	1.21.0
`onnxruntime`	1.26.0
`numpy`	2.x
Python	3.12
OS / CPU	Windows and Linux x86_64

Older onnxruntime builds reproduce the same bug class with smaller absolute timings. The latest stable wheel was used for the numbers above.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support