A third artifact for AV stacks: a certified proof object that shows who would have to do what to make a blocked maneuver work

A third artifact for AV stacks: a certified proof object that shows who would have to do what to make a blocked maneuver work — type0 | type0

PREVIEWA third artifact for AV stacks: a certified proof object that shows who would have to do what to make a blocked maneuver work · MD

The maneuver is vetoed. But is it permanently off the table, or just waiting for someone specific to do something specific?

Most rule-aware autonomous-driving stacks answer that question one of two ways. A safety shield or rulebook looks at the proposed action, checks it against a hard rule — stop at the line, do not enter the oncoming lane, do not run a sign — and either passes it or vetoes it. A prediction-based planner tries to model what nearby drivers are likely to do, and then plans against that model. Neither path produces a particular kind of artifact: a runtime proof object that names a bounded multi-agent edit, identifies the agent who would have to make that edit, says whether the request is right-of-way affordable, and pins down the ego-vehicle fallback if the edit does not happen.

CARVE, a single-author arXiv preprint by Yifan Wang submitted 31 May 2026, introduces that artifact. It calls the result an interactive repair certificate and treats it as a third object sitting alongside the veto and the prediction in a rule-aware AV stack.

What a "vetoed" maneuver is actually missing

A hard rule can reject a maneuver for a reason that has nothing to do with physical impossibility. The ego vehicle's own trajectory sits inside a lane, but the rule margin is negative — some other agent is occupying a region the rule protects, and the proposed action would cross into it. From the rule's point of view, this is unsafe. From the maneuver's point of view, the question is narrower: if a specific other agent made a specific, small, lawful accommodation — moved a meter, braked briefly, yielded a lane stripe that belongs to them under right-of-way — would the margin come back positive?

Existing rulebooks, shields, and reachability filters are good at the first half of that problem. They are not built to return a certificate that says: yes, the repair exists, it is bounded, here is the agent that owns it, and here is what the ego will do if the agent does not actually make the edit. Prediction-based planners address the second half, but they do it by modeling likely behavior, which is a different epistemic object — a forecast, not a bounded, attributable claim.

CARVE's working hypothesis, in the paper's own framing, is that the field is missing the category in between.

The certificate, as a category

The certificate that CARVE defines has five fields, and the choice of those fields is the actual contribution:

the binding rule the original maneuver violated,

the repair category the edit falls into,

the repair set — the bounded set of actions that would restore feasibility,

a responsibility-weighted cost split between ego and the other agent, and

a fallback the ego commits to if the requested edit is not observed.

The non-claim is the part that makes the category honest. CARVE does not predict that the other driver will yield. It does not assume the other driver will comply. It certifies whether a proposed interaction is bounded, attributable, and normatively admissible under declared assumptions — and the discipline of "certify under declared assumptions, do not predict compliance" is what separates a certificate from a forecast.

That framing also explains why the paper is explicit that CARVE sits over a finite lattice of ego-owned and agent-owned tactical operators, rather than replacing the planner beneath it. It is a layer that says, of a candidate maneuver, "here is what would have to be true and who would have to do it." It is not a learned policy.

The cooperation envelope

The mechanism that makes a request checkable is a cooperation envelope the paper writes as B_j(s) = beta(pi_j) · alpha_j^max(s), where j indexes the other agent. The two factors are doing different work:

alpha_j^max(s) is the kinematic ceiling — the maximum the other agent can physically do, given the scene.

beta(pi_j) is a normative-priority scalar tied to that agent's policy role — what right-of-way the agent is allowed to spend.

Their product is the bounded request space. An agent-owned edit has to land inside this envelope to be admissible — possible and right-of-way-affordable at the same time. The separation between kinematic reachability and normative priority is the structural move that lets a planner ask "is this a request I am allowed to make of this agent?" without sliding into a prediction about whether the agent will honor it.

The five properties

A heuristic that emits "a request exists" is not yet a certificate. The paper proves five properties that turn the emit into a category:

Certificate soundness — if the certificate approves, the bounded repair is feasible under declared assumptions.

Structural right-of-way respect — the request does not demand more right-of-way than the other agent is allocated.

Exact finite-lattice minimality — among admissible repairs, the certificate picks the smallest one, with respect to the declared lattice of tactical operators.

Fallback contingency — the ego fallback is committed in the certificate, not computed after the fact.

Blame-consistency — responsibility is assigned to the agent that owns the edit, with the cost split it implies.

These are the properties the reader should weigh, not the headline number. They are also why the paper's title foregrounds the word certified — the claim is not that CARVE finds clever repairs, but that every emitted certificate is one a stack can ship into a downstream module and reason over.

The evidence behind the numbers

The evaluation the paper reports is replay-only. CARVE is run against 589 Lanelet2-geometry-grounded episodes from the INTERACTION dataset, which is a cited but separate benchmark; the abstract page does not surface a companion code or data release. The headline figures are:

98.64% acceptance of maneuvers that a hard-rule shield would have vetoed.

370/378 human-resolved false vetoes recovered, meaning vetoes that a human reviewer would have overturned, and that the certificate admits.

589/589 right-of-way respect preserved — the safety claim that the envelope is doing its job.

0 priority-agent false positives — the certificate does not ask a higher-priority agent to yield.

400/400 negative-stress vetoes upheld, meaning the certificate still refuses when no bounded repair is admissible.

Two of those are the load-bearing safety results (right-of-way respect intact, zero priority-agent false positives), and one of them is the stress test (negative-stress vetoes upheld). The 98.64% accept rate and the 370/378 false-veto recovery are supporting evidence that the category is doing useful work, not that CARVE is a deployed system. The paper is, in its own words, a method-plus-benchmark claim on INTERACTION replays — not a real-world deployment claim.

What this is, and what it is not

What CARVE is: a way for an AV stack to expand the artifacts it produces. Alongside "this action is vetoed" and "this agent will probably yield," it adds "this interaction is boundedly repairable, here is the owner, here is the request, here is the fallback." That is a structural change in what a runtime claim about a maneuver can look like.

What it is not: a competitor to ML-based interactive motion planners, a more permissive AV stack, a deployment, or a claim that the other driver will cooperate. It is a prediction-free, compliance-free certificate over a finite operator lattice, evaluated on replays from a single cited dataset, in a single-author arXiv preprint with no peer review, no venue acceptance, and no institutional affiliation, lab, or funding statement visible on the abstract page. Those caveats travel with the result.

The interesting question the preprint leaves open is not whether the certificate is useful — the abstract's safety results are the part that earns the question — but whether stacks downstream can actually consume a five-field object that pins a fallback and a cost split, and whether declaring a cooperation envelope turns out to be the right discipline for everyday urban driving, or only for the rare cases where a small, lawful accommodation is the difference between a feasible maneuver and a permanently rejected one.