The Grant Proposal Generator Is Already Here. The NIH Ban Has a Loophole.
The Grant Proposal Generator Is Already Here. The NIH Ban Has a Loophole.
In July 2025, the National Institutes of Health drew a line. Applications "substantially developed by AI" would no longer be considered the original ideas of the applicant. They would not be funded. NIH Federal Register Notice
The policy was a response to something already underway: a 57 percent surge in grant applications across 12 major research funders between 2022 and 2025, a wave that researchers at University College London and the Research on Research Institute traced directly to the release of ChatGPT. Nature Comment At the EU's Marie Skłodowska-Curie Actions fellowships, applications jumped 142 percent. At Wellcome, they doubled. At the British Academy's postdoctoral program, they rose 14 percent. Research Professional News The common thread was not a shift in science. It was a shift in tooling.
Agents can now be trained on a researcher's entire published body of work, the specific criteria of a funding panel, and the texts of previously funded grants from that panel. They can generate tens of candidate ideas, select the strongest, and produce a fully formatted application in minutes. A researcher needs only to review and submit. The Elsevier survey data tells the scale: 41 percent of researchers who use AI tools are now using them to draft grant proposals.
The NIH's ban was meant to be the deterrent. But nobody has tested whether it works.
The enforcement mechanism is a certification checkbox. When researchers submit to the NIH, they attest that the application reflects their original ideas. The policy relies on that attestation — and on the assumption that "substantially developed by AI" means something detectable. It does not.
The NIH has published no technical definition of what triggers the disqualification. There is no mandatory disclosure of which AI tools were used, or how. There is no verification procedure. A proposal written by an AI agent trained on a researcher's papers and every previously funded grant in the relevant program looks, to the reviewer's eye, like a well-crafted application from an experienced investigator. That is precisely what it is designed to be.
"Agents can be trained on a researcher's entire published body of work, on the grant criteria of the most relevant funding panel and on the texts of the most recently funded grants from that panel," Rees and Wilsdon wrote in Nature last week. The resulting proposal, they noted, is "not the researcher's argument shaped by AI. It is a fully AI-generated proposal optimized to the funders' brief."
The checkbox does not catch this. The people reviewing it may not know they are reading it.
This is the gap at the center of the NIH's policy: the enforcement mechanism was never stress-tested against the thing it was designed to stop. The agency declared a rule. It did not build a way to enforce it.
What the numbers actually show
The pressure is real. Research funders working with RoRI — including the Australian Research Council, the European Research Council, and Wellcome — have seen application volumes climb steadily since 2022. Accompanying the rise is a compression in measurable quality signals. At EU MSCA fellowships, the share of applications falling below the quality threshold for further consideration dropped from 20 percent in 2018 to 5 percent in 2025. Funders have pointed to this as evidence that application quality is improving.
It is not obviously that. The most parsimonious reading is that AI-optimized prose has become better at clearing the first review gate — the automated threshold check — than the humans who write without AI assistance. The filter is not measuring merit. It is measuring whether a proposal sounds like what a funded grant sounds like. An AI agent trained on successful grants is, by design, optimized to sound exactly like that.
A separate analysis of US agency grants found that proposals written or edited with chatbot assistance tended to be more similar to previously funded projects than those written without AI tools. That is the mechanism working as designed: convergence on past success, not deviation from it.
James Wilsdon, executive director of RoRI, has said the system is roughly 18 months from collapse under the weight of application volume if funders do not change their approach. The European Research Council has already moved to exclude previously unsuccessful researchers from certain calls. The UK Medical Research Council has reinstated interviews for all shortlisted applicants — a manual verification step intended to catch what the written proposal cannot reveal.
These are triage measures. None of them address the underlying problem: the written grant proposal was already an imperfect signal of research quality. AI has made it a fully unreliable one.
The arms race that is not happening
Detection tools exist. Some funders have deployed automated systems to flag textual patterns associated with LLM use. But these systems operate on the same technology that generates the proposals — and the generation models have had years to learn how to avoid the patterns detectors are trained on. The cat is not ahead of the mouse. They are the same animal.
More fundamentally, the arms race framing misses the structural problem. Even a perfect detector would not solve the issue, because the problem is not that AI-generated proposals are identifiable. It is that they are good. An agent trained on a researcher's actual work, writing for a specific program with full knowledge of what that program has funded before, will produce something that is technically competent, strategically aligned, and genuinely indistinguishable from a strong human application. The reviewer's problem is not detecting AI. It is that the AI has already solved the reviewer's job.
Peer reviewers using AI tools to assess proposals compounds the problem. More than half of researchers already use AI to assist with peer review, often against funder guidelines. When both sides of the review process are mediated by agents trained on the same body of previously funded work, the system stops evaluating the quality of ideas. It evaluates how well agents have learned to simulate the ideas funders have previously rewarded.
What the policy actually says
The NIH's NOT-OD-25-132 notice, published July 31, 2025, states that applications "substantially developed by AI" or containing "sections substantially developed by AI" will not be considered original ideas of the applicant and will not be reviewed. Detection after award may be referred to the Office of Research Integrity.
The notice does not define "substantially developed." It does not specify which AI tools must be disclosed. It does not require researchers to submit their prompts or the outputs of intermediate AI steps. It creates a standard with no measurement procedure, a prohibition with no enforcement mechanism, and a certification that requires nothing the applicant would not already provide voluntarily.
Rees's assessment is blunt. "Funding panels have always faced hard choices, but they could at least claim to be distinguishing excellent ideas from merely good ones. Agentic AI is making that claim increasingly hollow."
The NIH's response to a genuine crisis is a checkbox. The researcher who wants to comply with the spirit of the rule — and the researcher who wants to exploit its letter — face identical friction: none.
The market in the meantime
While regulators write policy, the commercial market has moved. AI grant writing tools have proliferated, marketed directly to researchers seeking competitive advantage. They are not hidden. They are not experimental. They are available, documented, and in active use. The Elsevier data — 41 percent of AI-using researchers drafting proposals with AI — is a floor, not a ceiling, because the survey predates the current generation of agentic tools.
The researchers using these tools are not necessarily acting in bad faith. Many are using AI the way they use statistical software: to execute on an idea they had. The problem is structural. The funding system's evaluative criteria — clarity of writing, coherence of argument, alignment with previously funded work — are exactly the criteria that AI optimization is designed to satisfy. The system is not measuring what it thinks it is measuring.
Some funders are moving toward alternatives. The MRC's reintroduction of interviews is the most direct response: verify the researcher, not just the proposal. Lotteries for grants within a quality band have been proposed as a volume control. Rees and Wilsdon have argued for shifting evaluation toward track records, portfolios, and sustained performance over time — measuring the scientist rather than the document.
These approaches share a common logic: the written proposal was always a proxy for something harder to measure. AI has exposed the proxy as broken. The question is not whether to replace it, but what to put in its place.
The NIH's checkbox does not answer that question. It does not even begin to.