A multi agent LLM workflow from Jason Cong's group rewrites C/C++ into HLS friendly form before optimization, beating AutoDSE — the UCLA VAST design space exploration tool that won the 2022 ACM TODAES best paper — by 6.51x.
High-level synthesis, the chip-design step that turns C/C++ into hardware, has spent decades asking the same question: how do you optimize a given piece of software into faster, smaller silicon. A new preprint from Jason Cong's group at Carnegie Mellon and UCLA argues that question has been looking in the wrong place. The real bottleneck, the paper says, is whether the code is in a form HLS can optimize at all.
The paper is AgRefactor: Self-Evolving Agentic Workflow for HLS Compatibility and Performance, posted to arXiv in June 2026 by Zou, Yang, Zijian Ding, Yizhou Sun, and Cong. Its mechanism is not a smarter pragma tuner. It is a multi-agent LLM loop that first rewrites the input C/C++ into a form HLS can work with, and only then runs the design-space exploration that prior work ran on whatever code the user handed over. The team calls the workflow "self-evolving" because the agent graph updates itself across rounds rather than following a fixed script.
The headline result is a 6.51x geometric mean speedup over AutoDSE, the design-space exploration tool from the same UCLA-VAST lab that won the ACM TODAES Best Paper Award in 2022. AutoDSE's premise was that you start with the code you have and search the pragma space hard. AgRefactor's premise is that the code itself is the wrong starting point. The comparison is concrete and falsifiable: a head-to-head against a known reference point, not a generic accelerator claim, and the trade-press writeup from SemiEngineering restates the same benchmark numbers.
The mechanism shift matters because it changes the entry point for AI in the chip-design toolchain. Until now, the EDA literature has largely treated AI as a smarter search over fixed inputs: better pragmas, better synthesis directives, better placement. Reframing the problem as AI-native code transformation moves the agent one level upstream. The model now owns the question of what the program should look like before any RTL is generated. For chip-design teams that have spent years writing C/C++ that runs correctly but cannot be efficiently synthesized, that is a different problem statement than AutoDSE-style tuning, which assumed the input was fixed and the optimizer was the bottleneck.
Caveats belong in the framing, not the conclusion. The 6.51x figure is self-reported on the paper's own benchmark suite against a single prior tool; independent replication on production-scale RTL codebases and broader benchmarks is not yet visible in the current references. The arXiv HTML has no companion tape-out or industrial adoption signal, and the mechanism distinction would narrow if AutoDSE's design-space exploration were extended downstream of an AgRefactor-style refactor in a future integrated toolchain. Treat this as a research advance on a known baseline, not a shipping toolchain.
What to watch next: whether the team open-sources the AgRefactor artifacts alongside the existing AutoDSE repository so outside groups can rerun the geometric mean on independent suites, and whether downstream pragma-tuning work absorbs refactoring as a pre-stage or treats it as a separate step.