When you put multiple AI agents in a room together and let them debate, the protocol governing how they talk to each other matters more than the model running them. That's the core finding of a new preprint posted to arXiv on March 28, 2026 by Ramtin Zargari Marandi, a researcher studying multi-agent systems. His study compares three debate protocols and a no-interaction baseline across twenty macroeconomic events and finds that the design choice at the coordination layer is where the real trade-offs live.
The four setups tested are straightforward in structure. Within-Round (WR) limits agents to seeing only what other agents wrote in the current round. Cross-Round (CR) gives them full context from all prior rounds. A novel Rank-Adaptive Cross-Round protocol (RA-CR) dynamically reorders agents each round and uses an external judge model to silence the lowest-ranked participant. The No-Interaction (NI) baseline is exactly what it sounds like: agents answer independently, never seeing each other.
The results break into a clear three-way split. RA-CR reaches consensus fastest, which makes it the right choice when convergence is the goal. Within-Round agents cross-reference each other's arguments more frequently but take longer to settle. The No-Interaction baseline, freed from the pressure of peer visibility, produces the widest argument diversity and is the only setup where that diversity stays stable across repeated runs.
The RA-CR result is the one worth sitting with. Silencing the lowest-ranked agent each round sounds harsh, but it is functionally an optimization for agreement rather than exploration. The external judge model is the design choice that makes this possible: it is not one of the debating agents, it is a separate evaluation step that decides who speaks next. That separation is the mechanism. What it produces is faster consensus. What it costs is the range of positions that survive the process.
For builders of multi-agent systems, this is an architectural decision disguised as a parameter. The protocol you choose determines what kind of outcome you get, and the choice is not reversible by scaling the number of agents. Adding more agents to RA-CR does not recover argument diversity; it reaches agreement faster. Adding more agents to the No-Interaction baseline does not accelerate convergence; it produces more independently generated positions at the same pace. The protocol is not a tuning knob. It is a structural commitment.
The macroeconomic case study gives the results texture. Twenty diverse events, five random seeds, matched prompts and decoding parameters across all conditions. The methodology is rigorous enough to take seriously, and the trade-off it identifies is a real architectural decision, not a dial you can turn both ways. What the study measures is consensus formation and argument diversity. Those two things are genuinely in tension: the more agents interact, the more they converge, but the less diversity of argument they produce. You cannot optimize for both simultaneously with the same protocol.
The broader implication is about what multi-agent systems actually are. The field has been moving from single LLMs toward multi-agent setups to overcome cognitive bottlenecks, but the coordination framework matters as much as the individual model. Parallel exploration and coordinated deliberation are not the same thing. RA-CR is the design for the latter. If you want agents to explore many positions before committing, having them not see each other at all might produce better results for that specific goal.
The practical warning for anyone deploying a multi-agent system: do not assume that adding agents automatically makes the output better. The protocol determines whether additional agents contribute diversity of perspective or collapse into a narrower consensus faster. Before you scale the number of agents, know which trade-off you are buying.
The paper is on arXiv.