Most Linux users never think about xor_gen(). It is the routine that produces and validates the parity blocks that turn three ordinary disks into RAID5, or six into RAID6, and it runs every time a chunk is written, scrubbed, or rebuilt. For years it has been a tight scalar C loop. Eric Biggers, the Google engineer who maintains much of the Linux kernel's crypto subsystem, has now replaced that loop with an AVX-512 implementation that uses 512-bit ZMM registers and the vpternlogq three-input XOR instruction, and on a Ryzen 9 9950X (Zen 5) desktop the result is a measured 19% to 41% improvement over the existing path, according to Phoronix's benchmarking of the patch series.
The headline number deserves the context around it. Software RAID is CPU-bound on the parity path, which means the gains show up on workloads that actually move parity: rebuilds after a disk replacement, scrub passes that read every block to catch bit rot, and sustained writes to a RAID5 or RAID6 array. The patch is not magic, and it is not a replacement for a hardware RAID controller. It is a faster xor_gen() inside the same md/mdraid framework that Linux has shipped for two decades.
What is more interesting is the hardware enablement map Biggers drew. The new path is enabled on AMD Zen 4 and newer (Zen 5, Zen 6), Intel Sapphire Rapids and newer server parts, Rocket Lake on the client side, and upcoming Nova Lake. It is explicitly disabled on Skylake Server and Ice Lake. The reason is recorded in the patch itself: those older Intel parts downclock aggressively when 512-bit operations are in flight, and the resulting frequency hit wipes out most of the throughput gain. The same !PREFER_YMM policy already gates the kernel's AVX-512 crypto and CRC paths, and Biggers is reusing it for RAID rather than reinventing the rule. For a sysadmin, that means a server purchased in 2018 or 2019 with Cascade Lake or Ice Lake will not see this speedup, even after the patch lands.
The work fits a recognizable pattern. Biggers has spent the last several years threading AVX-512 through Linux subsystems that most people never look at: accelerated AES-GCM, SHA-256, and CRC32C in the crypto API, and now the parity generator in md. Each one is the same shape: identify a hot inner loop, vectorize it with ZMM and vpternlogd or vpternlogq, gate it on a CPU feature flag, and refuse to enable it on silicon that would rather throttle than run wide. The contribution is not that RAID suddenly got 41% faster in the abstract. It is that the same engineering playbook, applied once more, is making the boring middle of the Linux storage stack measurably cheaper on the CPUs people are actually buying.
There are two reasons to hold off on calling this a win for end users. First, the patch is on the mailing list, not in mainline, and it still has to survive review from md maintainers and a wider test matrix than one Zen 5 desktop. Second, the 41% figure is the upper end of a range, measured on a single CPU family and a single benchmark run by Michael Larabel on Phoronix; AMD's own Zen 4 and Zen 5 parts are known to thermal-throttle under sustained AVX-512, so the steady-state improvement on a real rebuild of a multi-terabyte array may sit closer to the lower bound. The honest read is that md parity on a Ryzen 9 9950X is now meaningfully faster in code, and that the same code path will, if it merges, also run on Sapphire Rapids, Granite Rapids, and Nova Lake, but that nobody should plan a hardware purchase around a number from a single Phoronix run on a part that is not even in the kernel yet.