Aegis: Automated Error Generation and Attribution for Multi-Agent Systems

Abstract

As Multi-Agent Systems (MAS) become increasingly autonomous and complex, understanding their error modes is critical for ensuring their reliability and safety. However, research in this area has been severely hampered by the lack of large-scale, diverse datasets with precise, ground-truth error labels. To address this bottleneck, we introduce Aegis, a novel framework for Automated Error Generation and attribution for Multi-Agent Systems.

By systematically injecting controllable and traceable errors into initially successful trajectories, we create a rich dataset of realistic failures. This is achieved using a context-aware, LLM-based adaptive manipulator that performs sophisticated attacks like prompt injection and response corruption to induce specific, predefined error modes. We demonstrate the value of our dataset by exploring three distinct learning paradigms for the error attribution task: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning.

Method Overview

Aegis follows a principled three-stage pipeline to generate high-quality error data and supports multiple learning paradigms for robust error attribution.

The Aegis framework automatically generates labeled failures by taking successful multi-agent trajectories and applying controlled, context-aware error injections, enabling three distinct learning methods for error attribution.

Results

Performance Highlights

Aegis-SFT achieves 26.51 average score
2× improvement over base models
9,533 trajectories with 24,843 error instances
6 MAS frameworks × 6 task domains

Performance by Domain

Aegis-SFT (orange) consistently outperforms all baseline models across different task domains and MAS frameworks.

Complete Results on Aegis-Bench

Model	Pair		Agent		Error		Avg.
Model	μF1	MF1	μF1	MF1	μF1	MF1	Avg.
Random Baseline	0.33	0.21	4.54	3.56	11.23	11.15	4.08
Small-Scale Models
DCL (Ours)	8.33	5.30	22.93	20.23	24.73	27.70	12.61
Medium-Scale Models
Qwen2.5-7B-Instruct	5.02	2.52	27.55	14.49	14.96	11.36	12.43
+ SFT	5.05	2.80	60.03	22.70	19.61	16.90	17.99
+ GRPO	7.11	2.77	35.43	14.86	17.21	10.54	14.87
Qwen2.5-14B-Instruct	5.47	2.20	35.78	12.71	20.24	5.91	13.99
+ SFT (Aegis-SFT)	16.62	9.99	76.53	47.97	27.53	27.66	26.51
+ GRPO (Aegis-GRPO)	6.84	2.55	49.74	18.38	21.19	16.10	18.41
Qwen3-8B-Non-Thinking	3.96	1.40	21.34	8.16	15.81	13.89	10.12
+ SFT	9.68	5.73	64.79	38.96	20.37	20.36	21.41
+ GRPO	6.94	2.82	45.91	17.39	20.89	15.15	17.15
Qwen3-8B-Thinking	4.42	1.52	34.63	9.01	17.48	14.31	13.06
+ GRPO	4.41	1.66	36.11	15.73	17.94	12.03	17.58
Large-Scale Models
Qwen2.5-72B-Instruct	5.60	2.20	37.46	14.51	17.72	16.58	15.01
gpt-oss-120b	6.53	1.71	38.58	5.53	20.38	12.05	17.07
GPT-4.1	7.44	2.27	37.48	11.12	20.65	15.75	15.27
GPT-4o-mini	5.76	1.63	38.54	14.72	19.95	16.02	15.83
o3	7.86	2.27	40.31	23.27	22.37	16.76	20.24
Gemini-2.5-Flash	6.99	2.76	42.02	16.45	23.47	19.85	19.55
Gemini-2.5-Pro	6.96	2.88	41.32	16.15	19.93	16.29	18.35
Claude-Sonnet-4	7.68	2.34	40.73	15.51	21.21	16.55	18.16

Citation

If you find Aegis useful for your research, please cite our paper:

@article{kong2025aegis,
title={Aegis: Automated Error Generation and Attribution for Multi-Agent Systems},
    author={Kong, Fanqi and Zhang, Ruijie and Yin, Huaxiao and Zhang, Guibin and Zhang, Xiaofei and Chen, Ziang and Zhang, Zhaowei and Zhang, Xiaoyuan and Zhu, Song-Chun and Feng, Xue},
    journal={arXiv preprint arXiv:2509.14295},
    year={2025}
}

Click here to copy citation to clipboard

Aegis

Abstract

Method Overview

Results

Performance Highlights

Performance by Domain

Complete Results on Aegis-Bench

Resources

Paper

Code

Dataset

Models

Benchmark

Documentation

Citation