For decades, medicinal chemists have treated one bond-forming reaction as a stubborn holdout. Chan-Lam coupling, a method for stitching carbon to nitrogen, the kind of linkage that runs through a long list of drug molecules, has resisted the usual optimization tricks. Yields on the harder substrates have hovered low enough that chemists route around the reaction rather than tune it.
A team from OpenAI and the chemistry-AI startup Molecule.one reports that their system closed the loop on that problem. According to OpenAI's write-up of the work, they connected OpenAI's GPT-5.4 model to Molecule.one's lab agent, Maria, which had hands-on access to a high-throughput chemistry lab. The agent picked the reaction target on its own, suggested a mild oxidant called TEMPO as an additive, designed the experiments, and ran two cycles of physical lab work. The most promising proposal, labeled OAI-M1-03, focused on primary sulfonamides as substrates, a class the system independently flagged as both hard and high-value.
The numbers are real but modest. Under the agent's optimized conditions, measured yields improved for 88% of the boronic acids and 83% of the sulfonamides the team tested. Mean yield rose from 16.6% to a value the source describes as above 25%, with the exact upper bound truncated in the publication. The lift is roughly eight percentage points on a reaction that medicinal chemists have been working around for years, meaningful for the field but still leaving absolute yields in territory a process chemist would call workable, not solved.
The "near-autonomous" label in the announcement is the operative word. Humans wrote the steering and grading prompts that shaped the agent's behavior, selected which proposals were worth running, made limited corrections to experimental plans, assisted with basic lab operations, and independently validated the final yield numbers. The work is also single-source: it comes from the team that built the system, and the result has not yet been reproduced outside the OpenAI and Molecule.one collaboration. The improvement is on one reaction class against one substrate set, so generalizing to other stubborn couplings is premature.
What is new is the loop closing, not the yield. An AI system proposed the experiment, ordered the work, watched the result, and proposed a better version, all in a real wet lab rather than a simulation. That is a different category of milestone from a model that scores well on a chemistry reasoning benchmark. Whether the same approach can be ported to the next stubborn reaction, or scaled beyond a small library of substrates, is the question worth watching next.