The Ringelmann Effect Comes for AI Agents
In 1913, a French agricultural engineer named Maximilian Ringelmann published a study documenting something counter-intuitive: when farmhands loaded hay together, adding more workers to the team made each individual less productive. Collective output plateaued well before the theoretical maximum. The cause was not exhaustion or poor equipment. It was coordination overhead — the more hands on the rope, the more time everyone spent managing each other instead of pulling.
A hundred and thirteen years later, AI researchers are reporting the same finding about a different kind of labor.
A paper published May 30, 2026 by researchers at Fudan University introduces the Sequential Iterative Multi-Agent System framework — SIMAS — a minimalist architecture designed to isolate one variable: what happens to performance as you add more LLM-powered agents to a task? The result is a graph that looks exactly like Ringelmann's. Performance does not scale monotonically with agent count. It follows a pattern of diminishing returns, governed by a trade-off between collaborative synergy and coordination overhead. At a certain point, each additional agent makes the whole system slower, not smarter.
The finding is not isolated. Google Research published a concurrent study in January 2026 testing 180 agent configurations across four benchmarks — financial reasoning, web navigation, planning, and tool use. Their conclusion: multi-agent coordination dramatically improves performance on parallelizable tasks but degrades it on sequential ones. The root cause, they found, is not what the industry assumed.
We have been solving for the wrong bottleneck.
The prevailing assumption in enterprise AI right now is that context length is the constraint — that the reason multi-agent systems fail at scale is that agents run out of memory or lose track of shared context. Billions of investment dollars are flowing into extended context windows. The Fudan paper and Google Research point to a different culprit: agents talking past each other. The performance degradation stems from coordination overhead, not long-context failure. And this overhead grows with agent count regardless of how long the context window is.
The implication is uncomfortable for anyone who has built a product on the "more agents is better" assumption.
Collective intelligence, the researchers note, is an emergent property — contingent on strategic interaction design, not a guaranteed outcome of agent plurality. Without deliberate architectural choices about how agents communicate, a system risks achieving only the illusion of collaboration while failing to surpass the capabilities of a well-prompted individual.
Google Research took this further: they built a predictive model that can identify the optimal agent architecture for a given task with 87 percent accuracy. What they found is that the optimal agent count is not a monotonic function of task complexity. A task that benefits from three agents may see zero improvement — or active degradation — at seven. The variable is not the number of agents but the structure of their interaction.
The commercial platforms building multi-agent tooling have begun acknowledging the problem. LangChain's documentation on agentSupervision explicitly warns that "adding too many agents can introduce coordination complexity that outweighs the benefits of parallelization" — a near-verbatim admission that the pattern exists in production systems, not just academic benchmarks. CrewAI's best practices guide recommends limiting agent roles to avoid "communication overhead degrading task quality." These are vendor prescriptions, not independent research, but they confirm that practitioners building real systems have run into the same wall the academic papers describe.
The cost stakes are concrete. At current API pricing — roughly $3–$15 per million tokens depending on model tier — an over-provisioned multi-agent system running unnecessary coordination cycles can multiply compute costs without improving output. A system that needs three agents to finish a task in six minutes is cheaper to run than one that throws seven agents at the same task and finishes in seven minutes, including the overhead of managing the extra four. Agent-count-based licensing models, where vendors charge per-agent-seat, amplify this dynamic: buying seats for agents that are net-negative on performance is a direct budget leak. The repricing pressure the industry is starting to feel may not come from external negotiation so much as from internal audits that reveal how much is being spent on agents that make the system slower.
This reframes the engineering question. The bottleneck is no longer raw model capability or context window size. It is orchestration design: how agents are connected, what they communicate, and when. The companies that win the next phase of agent infrastructure may not be the ones with the largest models but the ones with the best coordination protocols.
One open question the research does not yet answer: whether this finding holds in production environments at commercial scale, where agent workloads are messier and more heterogeneous than benchmarks. SIMAS tests sequential interaction — whether the same diminishing-returns pattern holds in debate or DAG topologies remains an open question. The evidence base is academic. But two independent research groups — Fudan and Google — arriving at the same conclusion through different methods is a signal worth noting before you write your next agent-count architecture decision.