Found by AI, Fixed by AI

Most of my findings spend weeks in some kind of holding pattern. Embargo windows, maintainer queues, CVE assignment, the slow back-and-forth of responsible disclosure. This one moved faster. A fuzzer I built found a memory safety bug in ONNX’s Conv shape inference. I reported it, and once it reached the right repository the upstream fix merged the same afternoon, written largely by a coding agent. Found by an AI-driven method, fixed by an AI coding agent. That is a clean enough loop that it is worth writing down what actually happened, and being precise about what it does and does not mean.

The bug

The finding is a heap out-of-bounds read in ONNX’s shape inference for the Conv operator. The function is convPoolShapeInference_opset19 in onnx/defs/nn/old.cc, around line 188, one of three opset-versioned copies of the same Conv/Pool shape-inference logic. It fires during model load, before any inference runs.

To understand it you need two facts about how a Conv node is described in ONNX. A Conv node has an input tensor X and a weight tensor W. The input X carries a batch dimension, a channel dimension, and then some number of spatial dimensions. For a normal 2D convolution, X has rank 4, so it has 2 spatial dimensions. The weight tensor W is supposed to match: rank 4, with its own 2 spatial dimensions describing the kernel.

Shape inference needs to know the kernel size. There are two ways it can get it. If the node has an explicit kernel_shape attribute, it uses that. If not, it derives the kernel shape from the weight tensor’s spatial dimensions. That second path is the one that bites.

Here is the relevant code, lightly condensed. This is all public now, so I can show it directly.

std::vector<int64_t> dilations;
if (use_dilation && getRepeatedAttribute(ctx, "dilations", dilations)) {
    if (dilations.size() != n_input_dims) {
        fail_shape_inference("Attribute dilations has incorrect size");
    }
} else {
    dilations.assign(n_input_dims, 1);   // sized to the INPUT's spatial rank
}

std::vector<int64_t> kernel_shape;
// no explicit kernel_shape attribute, so derive it from the WEIGHT tensor:
auto second_input_shape = ctx.getInputType(input2Idx)->tensor_type().shape();
for (int i = 2; i < second_input_shape.dim_size(); ++i) {
    kernel_shape.push_back(second_input_shape.dim(i).dim_value());
    // kernel_shape.size() == weight.rank - 2
}

std::vector<int64_t> effective_kernel_shape = kernel_shape;
for (size_t i = 0; i < kernel_shape.size(); i++) {
    effective_kernel_shape[i] =
        (effective_kernel_shape[i] - 1) * dilations[i] + 1;  // line 188
}

Look at the two sizes. dilations is sized to n_input_dims, which comes from the input tensor X. kernel_shape is sized to weight.rank - 2, which comes from the weight tensor W. The code validates that dilations.size() == n_input_dims. It never validates that kernel_shape.size() == n_input_dims.

Then the final loop iterates over kernel_shape.size() while indexing dilations[i]. If the kernel shape is longer than the dilations vector, the loop reads dilations[i] past the end of its backing storage. That is the out-of-bounds read. CWE-125.

The trigger condition follows directly. You craft a Conv node where:

Input X has rank 4, so n_input_dims = 2.
Weight W has rank 5, giving it 3 spatial dimensions, so kernel_shape.size() = 3.
There is no explicit kernel_shape attribute, so the kernel shape is derived from W.
dilations defaults to length 2.
On iteration i = 2, the loop reads dilations[2], 8 bytes past the end of a 2-element vector.

A normal model never does this. A real Conv node has a weight tensor whose spatial rank matches the input. But nothing in the parser enforces that before this loop runs, and a crafted file can set the two ranks independently. The PoC is a 255-byte ONNX model. There is no inference, no optimizer pass, no warmup needed. The read happens inside InferenceSession::Load(), as part of the graph resolve and type-and-shape inference that runs while the model is still being parsed.

The over-read is 8 bytes, and whatever heap content sits immediately after the dilations vector gets treated as a kernel dimension value. So in addition to being a classic memory safety violation, it can feed a garbage dimension into downstream shape math, which is its own can of worms if that value is large.

How it was found

I run Crucible, a structure-aware fuzzer for ML model parsers, at Halo Forge Labs. The relevant campaign here is a libFuzzer harness over ONNX Runtime’s loader, driving CreateSessionFromArray and InferenceSession::Load() with mutated and structured ONNX inputs.

This particular site only became reachable after I expanded the corpus with structured ONNX seeds and turned on extended graph optimization, which runs shape inference during initialization. Once those were in place, the campaign produced 8 distinct crash inputs, and all 8 fingerprinted to the same line, old.cc:188. That convergence is a good sign you are looking at one real bug and not eight flaky ones.

The structure-aware part matters. A purely byte-level mutator is not going to reliably produce a valid-enough ONNX protobuf with a Conv node whose weight rank intentionally disagrees with its input rank. You need the fuzzer to understand the format well enough to build models that are structurally plausible but semantically hostile. That is the whole premise of the tool, and this finding is a tidy example of it paying off.

The disclosure path

I originally reported this against ONNX Runtime on May 31, since that is where my harness found it and where the ASan stack trace was rooted. That became microsoft/onnxruntime#28731.

The ORT maintainer made the right call quickly. The crash shows up in ONNX Runtime frames, but the actual buggy function lives in the onnx op schema library that ORT bundles. ORT is a consumer of that shape inference code, not the owner of it. So the maintainer redirected the report to the upstream onnx project, where the fix belongs. A community contributor also opened a proposed fix against ORT to be ported upstream.

I refiled it as a public onnx issue, onnx/onnx#8036, on June 2, with the PoC embedded. There was no embargo to maintain here. The bug is a load-time over-read in a widely used open source library, so the right move was to get it in front of the maintainers in the open and let the fix land fast.

It landed fast. A fix PR went up two minutes after the issue, almost certainly the contributor’s earlier ORT-side patch carried upstream, and it merged as onnx/onnx#8037 about three hours later the same afternoon, targeting milestone 1.22. So the public-issue-to-merge window was hours, though the full path from my first report to ONNX Runtime ran a couple of days across the redirect, and that original ORT issue is still open. The fix did the complete job, not just the one line I pointed at. It added the missing length check to all three opset-versioned copies of convPoolShapeInference, the function the pooling operators share too, so MaxPool and AveragePool were covered by the same change, not just Conv. I have also filed a VulDB entry and requested a CVE.

Found by AI, fixed by AI

Here is the part I keep turning over. The bug was found by an AI-driven method: a structure-aware fuzzer that builds adversarial models from a learned sense of the format. And the fix was authored by an AI coding agent: GitHub’s Copilot agent opened the upstream pull request (onnx/onnx#8037) and wrote every commit in it, with an ONNX maintainer (titaiwangms) as co-author who reviewed, scoped, and merged it. That is verifiable on the PR itself, where the commit author is copilot-swe-agent[bot] and the commit trail reads like an agent working a plan: “Initial plan”, then the fix, then “Cover opset1_to_11 path and harden test”, then “Clarify opset range comments”. The upstream fix merged within hours of the public issue, the same afternoon.

I want to be measured about what that demonstrates, because it is easy to over-read.

What does it show? For a well-scoped, clearly-rooted memory safety bug, the modern tooling on both ends of the loop is genuinely good. The discovery side can generate inputs that exercise paths a human would rarely write a test for. The fix side can take a precise root-cause writeup, find the analogous code paths a human reviewer might miss, and produce a correct, complete patch. When the bug is legible and the fix is local, this works, and it works quickly.

What it does not show: that the humans were optional. The ORT maintainer made the routing decision that sent this to the right repository, and a human maintainer reviewed and merged the agent’s patch. My harness did not understand that the bug belonged to onnx rather than ONNX Runtime. The coding agent did not decide on its own that the other opset copies and the shared pooling path deserved the same guard until that scope was surfaced. The fast, clean outcome came from AI tooling slotted into a process that still had competent people at the decision points. The bug was also legible in a way that flatters automation. It is a single missing length check with a small, contained blast radius and an obvious correct fix. Plenty of real bugs are not like that. They span modules, they have ambiguous correct behavior, or fixing them means changing an interface that other code depends on. Nothing about this finding tells you how the same loop performs on those.

So I would not call this the future arriving. I would call it a clean data point. The pipeline of automated discovery into automated fix into human review can close a real memory safety bug in an extremely widely deployed library fast, when the bug is the kind that fits that pipeline. That is a useful thing to know, and it is a useful thing to be honest about the boundaries of.

The bug is fixed. The fix is complete and public. The loop closed quickly, the upstream fix landing the same afternoon as the public issue. That is the whole story, and it is a good one, as long as you do not tell it as more than it is.

Crucible is a structure-aware fuzzer for ML model parsers built at Halo Forge Labs. Individual findings are released as embargoes lift and fixes land. This post covers CRUCIBLE-2026-086, a heap out-of-bounds read in ONNX’s Conv shape inference (onnx/onnx#8036, fixed in onnx/onnx#8037, milestone 1.22), originally reported via ONNX Runtime as microsoft/onnxruntime#28731.