A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives

Brave API

Multi-agent cooperative decision-making involves multiple agents collaborating to achieve specific objectives, with applications in autonomous driving, drone navigation, disaster rescue, and military simulations . This survey examines the leading simulation environments for testing multi-agent systems (MAS), including the Multi-Agent Particle Environment (MPE) for 2D cooperative and competitive tasks, StarCraft Multi-Agent Challenge (SMAC) for decentralized micromanagement under partial observability, and Google Research Football for complex 3D strategic interactions .

The decision-making approaches in MAS are categorized into five types: rule-based (primarily fuzzy logic), game theory-based, evolutionary algorithms-based, deep multi-agent reinforcement learning (MARL)-based, and large language models (LLMs) reasoning-based . Due to their superior performance in dynamic and uncertain environments, the survey emphasizes MARL and LLM-based methods over traditional approaches . MARL techniques enable agents to learn optimal policies through environmental interactions, structured under paradigms such as Centralized Training with Centralized Execution (CTCE), Decentralized Training with Decentralized Execution (DTDE), and Centralized Training with Decentralized Execution (CTDE), the latter being particularly effective for balancing learning efficiency and execution autonomy .

In MARL frameworks, methodological components include value function decomposition, actor-critic architectures, and policy optimization algorithms like Proximal Policy Optimization (PPO) . Communication mechanisms—such as broadcasting, targeted, and networked communication—are also analyzed for their role in enhancing coordination . Meanwhile, LLMs contribute to MAS by enabling natural language-based reasoning, task decomposition, and resource allocation, with innovations like feedback loops, shared belief states, and role specialization improving collaborative outcomes .

Key challenges identified include non-stationarity due to concurrent learning, credit assignment, scalable communication, and task decentralization . Future research directions emphasize robustness, adaptability, theoretical foundations, and sim-to-real transfer, particularly in safety-critical domains like autonomous systems and robotics . The integration of human knowledge via standardized procedures, as seen in MetaGPT, highlights the potential for structured, role-based collaboration in LLM-driven MAS .

Generated Mar 7, 2026

Cerebras Thinking

This paper provides a systematic overview of Multi-Agent Cooperative Decision-Making (MACDM), explicitly shifting the focus away from traditional methodologies—such as rule-based systems, game theory, and evolutionary algorithms—to prioritize modern data-driven paradigms. It categorizes the field across diverse application scenarios, ranging from collaborative robotics and autonomous driving to complex resource allocation, providing a structural taxonomy of how agents interact within shared environments. By surveying the state-of-the-art, the authors map the trajectory from rigid, hand-crafted coordination policies to adaptive, learning-based frameworks that can handle dynamic and uncertain real-world conditions.

The key contribution of this work lies in its comparative analysis of the two dominant contemporary approaches: Multi-Agent Reinforcement Learning (MARL) and Large Language Model (LLM)-based agents. The survey dissects the strengths of MARL in mastering low-level control and sequential decision-making through trial-and-error, while simultaneously highlighting the emerging potential of LLMs to provide high-level semantic reasoning, negotiation, and generalization capabilities. It offers insights into how these distinct technologies address the "credit assignment" problem and emergent communication, suggesting a trend toward hybrid architectures where the semantic reasoning of LLMs complements the optimization power of RL.

This research matters significantly as it serves as a vital roadmap for navigating the rapidly converging fields of reinforcement learning and generative AI within multi-agent systems. It outlines critical contemporary challenges, including the scalability of agent populations, the safety of emergent behaviors, and the computational overhead of deploying large foundation models in latency-sensitive environments. By identifying these bottlenecks and proposing future perspectives, the survey equips researchers and engineers with the necessary context to develop more robust, explainable, and efficient cooperative systems capable of operating in complex, open-world scenarios.

Generated Mar 12, 2026

Open-Weights Reasoning

Summary of A Comprehensive Survey on Multi-Agent Cooperative Decision-Making

This survey provides a structured overview of multi-agent cooperative decision-making (MACDM), with a strong focus on multi-agent reinforcement learning (MARL) and large language model (LLM)-based approaches, while distinguishing them from traditional methods like rule-based systems, game-theoretic models, and evolutionary algorithms. The paper categorizes cooperative scenarios into fully cooperative, partially cooperative, and competitive-cooperative settings, highlighting the unique challenges in each (e.g., credit assignment, non-stationarity, and partial observability). It then reviews key MARL algorithms—such as value-decomposition methods (VDN, QMIX), actor-critic architectures (MADDPG, MAPPO), and centralized training with decentralized execution (CTDE)—alongside emerging LLM-enhanced approaches that leverage natural language for planning, communication, and reasoning in dynamic environments. The survey also discusses hybrid architectures, where LLMs act as high-level planners while MARL handles low-level control, and outlines open challenges like scalability, robustness to adversarial behavior, and ethical alignment.

The paper’s key contributions lie in its critical synthesis of recent advances, particularly the convergence of MARL and LLMs, which represents a paradigm shift in MACDM. By emphasizing data-driven and self-improving systems, the survey underscores how these methods address limitations of classical approaches (e.g., hand-crafted rules or equilibrium-based solutions) in complex, real-world domains like robotics swarms, autonomous vehicle coordination, and multi-agent AI systems. Its perspectives section identifies future directions, including multi-modal reasoning, human-agent collaboration, and explainability, making it a valuable resource for researchers and practitioners working at the intersection of reinforcement learning, multi-agent systems, and AI alignment. The work is particularly relevant in an era where autonomous agents must operate in open-ended, uncertain environments, necessitating adaptive, language-augmented decision-making frameworks.

Generated Mar 12, 2026