Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
When Multi-Agent Systems Fail, Aegis Knows Who to Blame
Multi-agent AI systems are fragile in ways that are hard to debug. When an agent in a coordinated system makes a bad decision, figuring out which agent failed and why requires understanding the full execution trace — and there is almost no labeled data to train error-detection systems on.
A new paper introduces a framework called Aegis that automates the process of generating error datasets for multi-agent systems. Instead of paying human annotators to label failures — slow, expensive, and unscalable — Aegis uses an LLM-based manipulator to inject context-aware errors into successful execution trajectories. The result is a dataset of 9,533 annotated trajectories covering diverse multi-agent architectures and task domains.
The framework supports three learning paradigms: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. The fine-grained labels and structured positive-negative sample pairs allow models to learn not just that something went wrong, but which agent failed and what type of error it was.
The empirical results are notable: several of the fine-tuned smaller models performed competitively with proprietary models an order of magnitude larger. That is a meaningful result for anyone deploying multi-agent systems in production — it suggests you do not need the largest frontier model to do error attribution well, if you have the right training data.
The core contribution is solving the data bottleneck. Multi-agent error attribution has been hard to improve because there is no large-scale, diverse dataset of labeled failures. Aegis generates that data synthetically. Whether the synthetic errors transfer to real-world failure modes is the open question — but the paper's methodology for evaluating error attribution performance against ground truth is itself a contribution.
The research is at arXiv:2509.14295.