ONNX Runtime load-time CPU DoS with nested If graphs
This repo contains a small ONNX test case for a load-time CPU denial of service in ONNX Runtime. These files are not useful ML models. Do not put them in an auto-loading or production model pipeline.
Short version
poc_deep_d20.onnx is a 3.6 KB ONNX file that:
- passes
onnx.checker.check_model(path, full_check=True); - is accepted by
onnxruntime.InferenceSession(path); - spends about 56 seconds building the session on
onnxruntime==1.26.0/onnx==1.21.0; - prints no warning or error while it is doing the work.
The matching flat control file loads in about 4 ms on the same machine. That makes the deep file about 14,000 times slower to load, even though it is still only a few kilobytes.
The problem is not graph optimization. Loading with
ORT_DISABLE_ALL takes about the same amount of time as ORT_ENABLE_ALL.
Increasing intra_op_num_threads and inter_op_num_threads also does not
help. The slow path is graph type/shape inference during session creation.
Files in this repo
| File | Size | SHA-256 |
|---|---|---|
poc_deep_d18.onnx |
3287 B | f3bd63087ea131ccdec2d9d9051f5232f8a1bbe5f562b61b1cbefd77a308536b |
poc_deep_d20.onnx |
3613 B | f351044c5f54c783b749b67b78dddb049fd5ef5eb917bf0b0dbbf588dcdb5d85 |
poc_flat_control_d20.onnx |
1471 B | 905d8b189f5ac3a629dc72dd2f81fb6c1c8b33c45d3ebd8ffd28733e9ac48324 |
build_poc.py |
rebuilds the ONNX files from a depth value | |
verify_poc.py |
checks the models and times session creation | |
requirements.txt |
versions used for the timings below |
poc_deep_d18.onnx is included as a quicker check. It shows the same bug
class but takes less time to load. poc_flat_control_d20.onnx has no deep
nested control flow and is there as the baseline.
Reproduce
Use a throwaway virtual environment:
pip install -r requirements.txt
python verify_poc.py
Expected shape of the output on a current x86_64 machine:
[poc_deep_d20.onnx]
size : 3613 bytes
onnx.checker : ACCEPTED
ORT load (ENABLE_ALL) : 56.12 s
ORT load (DISABLE_ALL): 59.68 s
[poc_flat_control_d20.onnx]
size : 1471 bytes
onnx.checker : ACCEPTED
ORT load (ENABLE_ALL) : 0.004 s
ORT load (DISABLE_ALL): 0.004 s
amplification (deep load / flat load, ENABLE_ALL):
poc_deep_d20.onnx ~14099x (56.12s vs 0.004s)
Exact times depend on single-core CPU speed. The important part is the gap between the deep file and the flat control.
To rebuild the files:
python build_poc.py --depth 20 --out poc_deep_d20.onnx
python build_poc.py --depth 20 --flat-control --out poc_flat_control_d20.onnx
Depth 22 is also useful if you want to see the scaling:
python build_poc.py --depth 22 --out poc_deep_d22.onnx
On my test box depth 22 produced a 3.9 KB file and took about 4.4 minutes to load.
What the model does
The model defines one local function, local:DeepF.
The function body contains one If node. Its then_branch contains one
more If. That branch contains one more If, and so on for the selected
depth. Each else_branch is just a one-node Identity graph.
There is no recursive function call in the file. That matters because the
usual function-cycle check accepts it. onnx.checker accepts it as well.
The cost appears when ONNX Runtime resolves the graph and infers types for the nested subgraphs. Each extra level adds only a small amount of file size, but the load time roughly doubles.
Measurements
Measured with onnxruntime==1.26.0, onnx==1.21.0, Python 3.12, CPU
execution provider.
| depth | file size | load time | flat control | ratio |
|---|---|---|---|---|
| 16 | 2961 B | ~4.0 s | ~0.004 s | ~1010x |
| 18 | 3287 B | ~16.2 s | ~0.004 s | ~4061x |
| 20 | 3613 B | ~56.1 s | ~0.004 s | ~14099x |
| 22 | 3939 B | ~262.3 s | ~0.004 s | ~65500x |
| 30 | 5243 B | stopped after 350 s | ~0.004 s | lower bound only |
The depth-30 run had not finished when I stopped it. Based on the depth 20 to 22 jump, it is in the many-hours-per-load range.
There is a parser ceiling eventually. Very deep models hit protobuf's
upb_DecodeOptions_MaxDepth somewhere between depth 30 and depth 50. The
bug is not literally unbounded, but the depth that still parses is already
enough for minutes to hours of CPU per load.
Threading check
I also checked whether more ORT threads make session creation faster. They do not.
intra_op_num_threads |
inter_op_num_threads |
depth-20 load |
|---|---|---|
| 1 | 1 | 66.4 s |
| 4 | 4 | 65.5 s |
| 8 | 8 | 66.3 s |
The thread pools are for compute work during sess.run(). This issue is in
session construction, before inference starts, and it runs on the calling
thread.
Impact
Any service that accepts an ONNX file and calls
onnxruntime.InferenceSession(path_or_bytes) before applying a time limit is
exposed.
The load happens before sess.run(), so a service can spend the CPU time
even if it never runs inference. Retrying the same file repeats the cost.
Multiple worker processes can be pinned at the same time by submitting
multiple small files.
This is a denial-of-service issue. I did not see memory corruption, code execution, or a crash. The runtime eventually returns a session; it just takes a long time.
Fix idea
The slow cycle is:
Graph::Resolve
-> Graph::VerifyNodeAndOpMatch
-> Graph::PerformTypeAndShapeInferencing
-> Graph::InferAndVerifySubgraphTypes
-> GraphInferencerImpl::doInferencing
-> onnx::IfInferenceFunction
-> nested If subgraph
The direct fix is to cap subgraph nesting depth, or cap total inference work
inside one Graph::Resolve call. The cap needs to be below 20 to stop this
test case in practice. A limit in the 8 to 16 range keeps the measured load
time in seconds and should still be well above normal ONNX control-flow
models.
Adding the same depth check to onnx.checker would also help, because then
services could reject the file before it reaches ONNX Runtime.
Versions tested
| Component | Version |
|---|---|
onnx |
1.21.0 |
onnxruntime |
1.26.0 |
numpy |
2.x |
| Python | 3.12 |
| OS / CPU | Windows and Linux x86_64 |
Older onnxruntime builds reproduce the same bug class with smaller
absolute timings. The latest stable wheel was used for the numbers above.