To understand why a computer chip built in 1980 needed a special trick to add numbers, start with the long addition problem. When a student adds 999999 and 1, the work runs right to left: each column overflows into the next, and the carries ripple up the page. Now imagine doing that across 69 columns, or 69 bits, and finishing before a single clock cycle ends. That is the constraint Intel's 8087 floating-point coprocessor had to meet, and it shaped the most important design decision on the die.
The 8087 was a separate chip that paired with the 8086 and 8088 CPUs. Released in 1980, it handled floating-point arithmetic, square roots, and transcendental functions like tangent, exponentiation, and logarithms, offloading work the host CPU could not do quickly on its own. Ken Shirriff's reverse-engineering walkthrough of the die describes the 8087 as performing math up to roughly 100 times faster than the host CPU. That headline figure is the 8087's reason to exist, and the adder at the heart of the chip is the reason the figure is real.
The 8087's die measures 5mm by 6mm with 40 external pins, and its functional blocks are visible under a microscope: a Bus Interface Unit for talking to the host, a large microcode ROM in the middle, and the adder, shifters, and registers that actually crunch numbers. The arithmetic heart of the floating-point execution unit is a 69-bit adder, framed in Intel's own patent language as the chip's central engine for arithmetic, square roots, and transcendentals. Shirriff's die-photo walkthrough and author-drawn schematics show the adder's transistor logic in detail, with cross-references to the original Intel patent.
A naive way to build a 69-bit adder is the ripple-carry design: each bit waits for the carry from the bit to its right, and the carry propagates leftward one stage at a time. In the worst case, adding 999999 to 1 in binary, the carry has to walk all 69 bits. The propagation delay grows linearly with the word length, and at the 8087's clock speeds that delay would not fit inside a single cycle. A faster alternative is a full carry-lookahead adder, which computes every carry in parallel using dedicated logic. The cost is a dense web of routing that consumes die area and complicates layout, a serious problem on a 5mm by 6mm chip designed in 1980.
The 8087's designers chose a middle path. They split the 69-bit adder into blocks of 4 bits. Within each 4-bit block, a small carry-lookahead circuit computes the block's carries quickly. Between blocks, a slower ripple-carry chain stitches the blocks together. The result is an adder that is much faster than a pure ripple across 69 bits, because the longest carry chain now spans only the block boundaries, and is far more practical to lay out than a full carry-lookahead, because the per-block logic is small and the inter-block wiring is simple. This 4-bit block compromise, documented in Shirriff's die analysis of the 8087's adder, is the trade-off that made the 8087's speed claims work in real silicon.
The same block-boundary reasoning shows up in nearly every modern arithmetic unit. Contemporary ALUs in general-purpose CPUs, the execution units in Apple Silicon, and the shader cores in GPUs all pipeline addition by grouping bits into blocks, computing carries within each block in parallel, and propagating a reduced carry signal between blocks. The 8087's 4-bit block is a conceptual ancestor of these designs, a working example of how architects think when speed, area, and routing all have to fit on one piece of silicon.
The 8087 was not a perfect solution. Pairing a separate floating-point chip with the host CPU was a stopgap, not a destination, and the industry moved past it: Intel integrated the FPU on-die with the 80486DX in 1989, ending the coprocessor era. The 8087 and its siblings also had accuracy edge cases in transcendental functions, and similar flaws echoed two decades later in the Pentium FDIV bug, a legitimate line of criticism that any honest accounting of the FPU lineage has to preserve. Shirriff's reverse-engineering work is the clearest available look at the silicon that started that lineage, and at the carry-chain trade-off that made its speed claims possible.