The hard part of designing a chip for a car is not writing the logic. It is proving, with mathematical certainty, that the logic will not fail in a way that hurts someone. That proof is called functional safety, and it governs every electronic control unit in a modern vehicle, every safety sensor on a factory floor, and most of the silicon in medical devices. A new paper from Arizona State University and Texas Instruments India argues that large language models can shoulder more of that burden than they have been asked to, by drafting the safety checks themselves and ranking how dangerous each potential hardware fault would be.
The framework, called SafeGen, treats functional-safety work as a three-part problem: turn a written specification into formal safety assertions, map those assertions onto the gate-level logic of the chip, and then grade every possible fault by how much it would violate the safety goal. Each of those steps has historically been a slow, manual exercise, or one that depends on running millions of simulation cycles against a fault-injected netlist. The paper's premise is that an LLM, properly scaffolded with a Hyper Knowledge Graph of safety standards and the chip's register-transfer-level description, can produce better assertions than prior AI tools, and produce more interpretable criticality grades than simulation alone.
The authors benchmark their claims on a Field-Oriented Control system, the motor-control reference design covered in the original industry write-up that crops up constantly in ISO 26262 case studies. They report that SafeGen-generated assertions are more semantically complete than those produced by earlier LLM-based assertion-generation frameworks, and that its fault-criticality grading explains its reasoning more clearly than gate-level simulation outputs. Both results sit on the paper side of the ledger: they are arXiv-authored benchmarks on a single case study, not independently validated industry consensus.
That distinction matters because the audience for functional-safety work is unusually unforgiving. Automakers and regulators want traceability from specification to assertion to fault mode to diagnostic coverage, and the regulator-facing standard, ISO 26262, does not care whether the assertion was written by a human, a script, or a language model. What it cares about is that the chain is documented, repeatable, and conservative. A tool that automates part of that chain is useful only if it leaves a clean audit trail, and that question is not answered by a benchmark against other LLMs.
The wider context is that the formal-verification bottleneck has been the loudest complaint from chip-design teams for years. Property checking and equivalence checking have always been mathematically rigorous but expensive to set up, because a verification engineer has to translate the specification into temporal-logic properties by hand. LLM-based assertion generation has been pitched as a way to skip that hand translation, and SafeGen is the latest in a small but steady stream of attempts, with the twist that it folds the assertion generator into a fault-criticality assessment that mirrors the industry's existing FMEDA (Failure Modes, Effects, and Diagnostic Analysis) flow. The paper's arXiv record lists a June 2026 preprint slot in the cs.AR architecture archive, consistent with the trade publication's June 29 coverage.
What to watch: whether the paper's comparison survives contact with a second case study outside motor control, whether Texas Instruments or any other chip vendor picks up the framework for an internal pilot, and whether ISO 26262 working groups eventually write guidance on accepting LLM-generated artifacts in the safety case. None of those questions is settled by the paper itself, but the framework is a clean test case for how aggressively generative AI can be pressed into the most regulated corner of chip design.