65dAINEWS

OpenAI is throwing everything into building a fully automated researcher - MIT Technology Review

reported by Sky · 5 min read · published March 20, 2026

PREVIEWOpenAI is throwing everything into building a fully automated researcher - MIT Technology Review · MD

The gap between OpenAI's researcher and what it has actually built

OpenAI has a new north star. Chief scientist Jakub Pachocki laid it out this week in an exclusive interview with MIT Technology Review: an autonomous AI research intern by September 2026, evolving into a fully automated researcher capable of tackling problems too large or complex for humans by 2028. "I think we are getting close to a point where we'll have models capable of working indefinitely in a coherent way just like people do," he said. "You kind of have a whole research lab in a data center."

That is the pitch. Here is what exists today.

The closest thing OpenAI has shipped to an automated researcher is Codex — an agent-based coding tool that can spin up code on the fly, analyze documents, generate charts, and manage a inbox. Most of OpenAI's own technical staff now use it, Pachocki said. It is being presented as a preview of what the research intern might eventually do. "I expect Codex to get fundamentally better," he said.

What OpenAI does not yet have is a system that can take a research problem, run experiments, and produce publishable results with limited human guidance — which is what "automated researcher" implies.

The credibility problem

The gap between the ambition and the reality matters because OpenAI has a track record of overclaiming in this exact territory. Last October, senior OpenAI figures including VP of science Kevin Weil posted on social media that GPT-5 had found solutions to ten previously unsolved Erdős problems in mathematics. The reaction was swift and embarrassing. Mathematicians pointed out that GPT-5 had found existing papers that solved those problems — solutions that were already in the literature, not novel proofs. DeepMind CEO Demis Hassabis called it "embarrassing." The posts were deleted. As MIT Technology Review reported, OpenAI reframed the claim as useful literature retrieval rather than new discovery.

This matters because Pachocki is now asking scientists and investors to trust a timeline — September 2026 for an intern, 2028 for a full researcher — that runs through territory where OpenAI has already tripped. The October episode does not prove the 2028 goal is wrong, but it does mean the company's self-reported progress deserves more scrutiny than usual.

What the skeptics say

Independent researchers who have tested these systems are not convinced the timeline is achievable. Doug Downey, a research scientist at the Allen Institute for AI who is not connected to OpenAI, told MIT Technology Review that chaining multiple tasks together — the core requirement for an automated researcher — is where things fall apart. "If you have to chain tasks together then the odds that you get several of them right in succession tend to go down," he said. His team tested several top-tier LLMs on scientific tasks last summer. GPT-5 came out on top but still made frequent errors. (Those results may already be stale, he noted — the team has not tested GPT-5.4.)

Andy Cooper, professor of chemistry at the University of Liverpool who is building his own automated AI scientist system, is more blunt: "We have not found, yet, that LLMs are fundamentally changing the way that science is done." He is using them in robotic workflows but not as research directors. "I'm not sure that people are ready to be told what to do by an LLM," he said. "I'm certainly not."

There is a subtler version of this critique that comes through in the scientists who are actually using GPT-5 in their work. Robert Scherrer, a physicist at Vanderbilt, told MIT Technology Review that GPT-5 solved a problem he and a graduate student had been stuck on for months — but he also noted the model "still makes dumb mistakes." Derya Unutmaz, an immunologist at the Jackson Laboratory, uses GPT-5 to analyze old data sets and find patterns. But when pressed on whether GPT-5 has produced genuinely novel findings, he frames it as acceleration of existing workflows rather than scientific autonomy.

Stateless statistician Nikita Zhivotovskiy at Berkeley put it most directly: "I have seen very few genuinely fresh ideas or arguments that would be worth a publication on their own. So far, they seem to mainly combine existing results, sometimes incorrectly, rather than produce genuinely new approaches."

The benchmark is real, the researcher is not

To be fair: the underlying capability jump is real. GPT-5 scores 92 percent on GPQA, a benchmark of more than 400 PhD-level questions in biology, physics, and chemistry. GPT-4 scored 39 percent. That is a genuine step change. And there are documented cases — with actual scientists, actual problems — where GPT-5 pointed researchers toward solutions they would not have found alone.

But solving a multiple-choice PhD exam and running an autonomous multi-month research project are different tasks entirely. The former is a benchmark. The latter is what OpenAI is promising. The benchmark is impressive evidence that the underlying models are improving. It is not proof that the timeline is realistic.

Pachocki himself acknowledges the risks. He raises the scenario that other researchers have raised: what if the system goes off the rails, gets hacked, or simply misinterprets its instructions? OpenAI's current mitigation strategy is chain-of-thought monitoring — training models to write scratchpad notes as they work, then using other LLMs to check those notes for unwanted behavior. It is a monitoring solution, not a control solution. "Until you can really trust the systems, you definitely want to have restrictions in place," Pachocki said.

The competitive context

OpenAI is not alone in chasing this. Google DeepMind has been here first — AlphaFold solved the protein folding problem that biologists had worked on for decades, and AlphaEvolve has been automating the discovery of mathematical identities. DeepMind CEO Demis Hassabis told MIT Technology Review back in 2022 that science was the reason he started the company. "This is why I've worked my whole career in AI," he said.

OpenAI presenting itself as a challenger in this space is not inaccurate, but it is rewriting history. The automated researcher is not a new idea. What is new is OpenAI committing to it as a "north star" and putting specific dates on it.

Whether those dates hold will depend on whether the gap between the benchmark and the research workflow is engineering or fundamental. Pachocki thinks it is engineering. The scientists using these tools in the field think it might be both.

OpenAI is throwing everything into building a fully automated researcher - MIT Technology Review

The gap between OpenAI's researcher and what it has actually built

Sources