A GPU compute wave that has reshaped data centers over the past three years is now arriving inside the chip fabrication back office, in the patterning step called computational lithography that decides whether an advanced-node design can actually be manufactured. Siemens EDA has published a whitepaper describing a massively parallel GPU rasterizer for that workload, covering high-resolution mask synthesis, lithography simulation, and optical proximity correction (OPC), benchmarked on NVIDIA H100 hardware against a tuned CPU baseline.
The Siemens team reports 290x faster than the CPU baseline on Manhattan-style shapes and 45x on curvilinear shapes, with sub-1% absolute error on synthetic test geometries. Those are vendor numbers, not independent benchmarks, and synthetic shapes are not the same workload as a full-chip OPC run on a real tapeout. The whitepaper also makes the broader structural argument: at 3nm and below, the number of masks per design, the cost of OPC iteration, and the compute required for EUV (extreme ultraviolet) multi-patterning have all been rising as nodes shrink, which is why a GPU-native rasterizer is now an engineering target rather than a curiosity.
Rasterization is the step that turns the geometry a chip designer drew into the per-pixel intensity map that determines whether each feature prints correctly on the wafer. At advanced nodes that map must be computed at high resolution, with fractional pixel coverage preserved so sub-resolution assist features (SRAFs), the small auxiliary shapes added to improve process margin, survive discretization. A traditional CPU rasterizer handles this sequentially, processing one pixel at a time, while the workload is intrinsically data-parallel and a natural fit for the thousands of cores a GPU provides. The accompanying Calibre engineering blog frames the problem as both a performance and a precision story at scale, and the technical paper itself walks through the connectivity-preservation requirements that drive the design choices.
The honest caveat is that all three references are Siemens-authored: a whitepaper, an engineering blog, and a trade press republication, with the speedup numbers vendor-supplied. There is no independent third-party reproduction cited, no foundry or integrated device manufacturer (IDM) customer reference, and no production tapeout wall-clock comparison. Head-to-head numbers against Cadence or Synopsys GPU roadmaps, or against in-house foundry flows, would matter before any of this is treated as a settled engineering fact. The vendor is also the reporter of the structural framing, so it is worth holding the broader argument separate from the Siemens product story: the forcing function (mask-count explosion and EUV multi-patterning cost) is broadly understood across the industry, while the claim that this specific rasterizer is the right answer to it remains open.
What to watch: independent benchmark numbers from a foundry or IDM, and whether NVIDIA's broader push into EDA workloads turns the GPU-as-EDA-accelerator story into a real product category. Either signal would tell readers whether the mask-synthesis pipeline is genuinely on its way to looking more like an AI training cluster in its compute shape.