AI Reasoning & Multi-Agent Systems - Gaius Collections

Brave API

This curated collection on AI Reasoning & Multi-Agent Systems aggregates 15 frontier papers and surveys from arXiv and sources like NBER, spanning reinforcement learning (RL), graph neural networks (GNNs), large language model (LLM) optimization, and agentic AI in economic contexts. It emphasizes scalable multi-agent coordination (e.g., process rewards, monotonic improvement guarantees via MonoScale, and principal-agent framings requiring mechanism design), autonomous decision-making (e.g., end-to-end belief-policy optimization in shared human-AI autonomy and IRL-DAL for safe trajectory planning with diffusion models), and reasoning advancements (e.g., GNNs executing graph algorithms exactly or disentangling multispecific antibody functions). Additional works explore efficient architectures like separable neural nets for agile RL, tensorized orthonormalization (TEON) for LLM pre-training, and small LMs generating dynamic game content.

Key themes interconnect through a focus on emergent behaviors and scalability in agentic systems. Multi-agent papers link coordination challenges to economic principles, such as principal-agent dynamics and market-like interactions in virtual economies, echoed in NBER and web surveys on AI agents transforming economic research, financial services (with model risk management), and Industry 4.0 manufacturing. Reasoning components (GNNs, RL architectures) provide foundational tools for exactness and adaptability, preventing issues like catastrophic forgetting, while tying into broader agent economies where AI interactions yield unintended dynamics. These threads converge on designing robust, verifiable systems that scale without performance cliffs.

These topics matter profoundly for deploying production-grade AI in high-stakes domains like autonomous driving, finance, and distributed manufacturing. By addressing coordination failures, safety via energy-guided planning, and economic incentives, the collection advances toward reliable multi-agent RL and agentic workflows—critical as AI agents proliferate in simulated economies and real-world applications. This work not only mitigates risks like non-monotonic scaling but also unlocks novel insights into emergent intelligence, positioning AI as a transformative tool for scientific discovery and industrial automation.

Generated Feb 21, 2026

Cerebras Thinking

This collection explores the frontier of AI reasoning and multi-agent systems (MAS), shifting the research focus from isolated model performance to complex, interactive, and autonomous ecosystems. A significant portion of the work investigates agentic architectures through an economic and strategic lens, arguing that multi-agent systems exhibit principal-agent dynamics that require careful mechanism design to ensure alignment and prevent issues like "lazy" agents or catastrophic performance drops. This includes comprehensive surveys and working papers on virtual agent economies, financial service applications, and industrial automation (Industry 4.0), highlighting how AI agents collaborate, compete, and evolve in market-like environments. The collection also bridges the gap between theoretical reasoning and practical execution, covering advancements in Reinforcement Learning (RL)—such as monotonic scaling guarantees (MonoScale), agile adaptation, and shared autonomy—alongside specific applications in mathematical problem-solving, legal reasoning, and dynamic game content generation using small language models.

A recurring technical theme is the pursuit of robustness, efficiency, and verification within these sophisticated systems. The research connects high-level reasoning with low-level safety mechanisms, distinguishing between weak and strong verification for trustworthiness and utilizing Graph Neural Networks (GNNs) for tasks ranging from exact algorithm execution to antibody characterization. Efficiency is addressed through innovations like TEON for optimized LLM pre-training and sink-aware pruning for diffusion language models, while safety and alignment are tackled via methods like MARS (margin-aware reward modeling) and energy-guided diffusion for safe trajectory planning. Furthermore, the collection emphasizes human-AI collaboration, defining protocols for counterfactual harm and user-specified requirements in high-stakes decision-making, alongside automated tools like FAMOSE for feature discovery. These topics are critical as they represent the necessary evolution from static large language models to dynamic, reliable, and economically viable autonomous agents capable of operating safely in the real world.

Generated Feb 22, 2026

Open-Weights Reasoning

AI Reasoning & Multi-Agent Systems: This curated research collection focuses on frontier work in AI reasoning, multi-agent systems, reinforcement learning, and autonomous decision-making. The collection includes 15 research cards from sources such as arXiv and NBER, covering various aspects of these topics.

Key Themes: One key theme in the collection is scaling multi-agent systems and improving coordination and performance. Several papers investigate the use of process-based rewards, monotonic improvement guarantees, and even treating multi-agent systems as principal-agent problems to address these challenges. Another theme is the application of advanced machine learning techniques, such as reinforce learning, inverse reinforcement learning, and graph neural networks, to develop autonomous agents that can perform complex tasks, execute graph algorithms exactly, characterize functional properties of multispecific antibodies, and generate high-quality dynamic game content. Lastly, the collection explores the application of these technologies in various industries like finance and the emergent economic behaviors in virtual environments populated by AI agents.

Why it Matters: AI reasoning and multi-agent systems play a crucial role in creating advanced autonomous agents and systems that can collaborate and make informed decisions. This collection highlights the importance of developing more efficient and effective methods for scaling multi-agent systems and advancing machine learning techniques. These advancements can lead to improvements in various fields, including finance, manufacturing, and gaming, and can open new avenues for research in Artificial General Intelligence (AGI) and beyond. By studying the latest research in this area, we can gain insights into emerging trends and innovations in AI and continue to push the boundaries of what's possible.

Generated Feb 21, 2026

Research Materials (64)

COMIC: Agentic Sketch Comedy Generation

Proposes a fully automated AI system using agent populations mimicking studio roles to generate SNL-style comedic videos via iterative competition, evaluation, and improvement. Key contribution: LLM critics aligned with real viewer preferences through preference analysis.

LLM-Based Multi-Agent Systems for Mathematical Problem Solving: A Comprehensive Literature Review[v1] | Preprints.org

Describes a hierarchical multi-agent system with RL fine-tuning and VRP CoT prompting, evaluated on benchmarks like MATH500, GSM8K, AIME using high-level meta-thinking and low-level reasoning agents.

AgentAsk: Multi-Agent Systems Need to Ask

Introduces AgentAsk, a three-stage pipeline (distillation, supervision, E-GRPO optimization) for agent querying that improves accuracy, latency, and cost on math, reasoning, and coding benchmarks.

Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation

Provides theoretical analysis of lazy agent emergence in multi-agent reasoning frameworks and advocates online reinforcement learning to balance contributions.

Instruction set for the representation of graphs

Presents IsalGraph, a method encoding any finite simple graph as a compact string over a 9-character alphabet using a virtual machine with a CDLL of nodes and traversal pointers, where every string decodes to a valid graph.

V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation

Introduces V2M-Zero, a zero-pair video-to-music generator that aligns music temporally with video events by matching shared change timing and magnitude, ignoring semantic differences.

Neural Field Thermal Tomography: A Differentiable Physics Framework for Non-Destructive Evaluation

Presents Neural Field Thermal Tomography (NeFTY), a differentiable physics framework parameterizing 3D diffusivity as a continuous neural field for quantitative reconstruction of material properties from transient surface temperatures.

LiTo: Surface Light Field Tokenization

Introduces a 3D latent representation that jointly models object geometry and view-dependent appearance by encoding random subsamples of surface light fields from RGB-depth images into compact latent vectors.

OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents

Demonstrates OrchMAS multi-agent system with reinforcement learning achieves consistent strong performance across diverse reasoning and scientific benchmarks, with public code available.

Benchmarking Multi-Agent AI: Insights & Practical Use | Galileo

Presents a flexible benchmark ideal for comparative analysis and innovation in multi-agent system architectures supporting diverse agent designs.

Benchmarking Multi-Agent AI: Insights & Practical Use | Galileo

Presents a flexible benchmark for multi-agent systems to compare architectural approaches.

MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games

Emphasizes developing LLMs for effective cooperation and competition in multi-agent systems toward advanced intelligence.

From single-agent to multi-agent: a comprehensive review of LLM-based legal agents

Reviews enhancements in legal AI like syllogism prompts, logic benchmarks, retrieve-read frameworks, and emotional interaction.

Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

Introduces PURE, a select-then-generate framework for preference-consistent explanations in LLM recommenders, addressing inconsistencies missed by standard metrics.

Safe and Robust Domains of Attraction for Discrete-Time Systems: A Set-Based Characterization and Certifiable Neural Network Estimation

Develops a framework for estimating safe, robust domains of attraction in uncertain, constrained nonlinear discrete-time systems.

On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions

Demonstrates Transformers approximate maxout networks, inheriting ReLU-like universal approximation with comparable complexity.

A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives

Surveys multi-agent system decision-making, prioritizing MARL and LLM-based over traditional rule/game/evolutionary methods.

Proactive Guiding Strategy for Item-side Fairness in Interactive Recommendation

Proposes proactive fairness in recommenders by guiding user preferences toward long-tail items, avoiding preference misalignment from direct insertion.

(PDF) Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Introduces MATTRL, injecting structured textual experience into multi-agent deliberation at inference time.

Compact Prompting in Instruction-tuned LLMs for Joint Argumentative Component Detection

Discusses argumentative component detection (ACD) in argument mining as a challenging task, with existing methods simplifying to labeling or pipelines.

Odin: Multi-Signal Graph Intelligence for Autonomous Discovery in Knowledge Graphs

Presents Odin, a production graph engine using COMPASS score (PageRank + NPLL) for autonomous pattern discovery in knowledge graphs.

Evaluation and Benchmarking of LLM Agents: A Survey

Argues multi-agent LLM evaluation requires new methods distinct from RL due to lack of predefined rewards.

Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

Introduces Procedure-Aware Evaluation (PAE) for LLM-based agents, assessing procedures via structured observations across Utility, Efficiency, Interaction Quality, and Procedural Integrity beyond mere task completion.

Multi-Scale Adaptive Neighborhood Awareness Transformer For Graph Fraud Detection

Highlights GNN limitations in graph fraud detection due to homogeneity assumptions and poor global modeling, proposing solutions to these challenges.

Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails

Proves Adam's superiority over SGD via second-moment normalization under bounded variance using martingale analysis, explaining empirical convergence gaps.

Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation

Analyzes emergence of lazy agents in multi-agent LLM reasoning frameworks, even with strong system performance, calling for online RL solutions.

LLM-Based Multi-Agent Systems for Mathematical Problem Solving: A Comprehensive Literature Review[v1] | Preprints.org

Details math benchmarks (e.g., MATH500, GSM8K), CoT prompting, RL fine-tuning, and hierarchical multi-agent architecture for reasoning.

Agentic AI: The age of reasoning—A review - ScienceDirect

Offers a chronological overview of agentic AI milestones, key papers, and breakthroughs.

From Complex Dynamics to DynFormer: Rethinking Transformers for PDEs

Critiques Transformer-based neural operators for uniformly treating spatial points in PDE solving, ignoring scale separation and incurring high costs.

Frontiers | Multi-agent systems powered by large language models: applications in swarm intelligence

Shows 70B LLMs outperform 7B models in resilience to noise or irrelevant data.

A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning

Presents A.R.I.S., a YOLOx-based portable sorter for real-time e-waste material classification to boost recycling efficiency.

Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting

Critiques scaling in time series foundation models for inefficiency despite performance gains, advocating alternatives.

FAMOSE: A ReAct Approach to Automated Feature Discovery

FAMOSE uses ReAct agents for autonomous feature augmentation and selection in tabular ML, reducing domain expertise needs.

[2504.19678] From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

Comprehensive review bridging LLM reasoning to autonomous AI agents.

Benchmarking Multi-Agent AI: Insights & Practical Use | Galileo

Benchmark supports diverse agent architectures for multi-agent system comparisons.

[2601.12538] Agentic Reasoning for Large Language Models

Differentiates in-context vs. post-training reasoning in agentic frameworks across domains like science and robotics.

When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

Distinguishes weak (cheap, internal) vs. strong (reliable, external) verification in LLM reasoning loops for trustworthiness.

Towards a Science of Scaling Agent Systems

Defines multi-agent scaling via agents, coordination, models, and tasks, evaluated on benchmarks like Finance-Agent.

A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives

Categorizes multi-agent decision-making, prioritizing MARL and LLM-based over traditional methods.

Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

Addresses gradient instability in black-box attacks on LVLMs due to ViT sensitivity, improving transfer-based methods like M-Attack.

Mine and Refine: Optimizing Graded Relevance in E-commerce Search Retrieval

Introduces 'Mine and Refine' contrastive training for semantic embeddings handling graded relevance in e-commerce search with long-tail queries.

LLM-Based Multi-Agent Systems for Mathematical Problem Solving: A Comprehensive Literature Review[v1] | Preprints.org

Details math benchmarks (MATH500 etc.) with hierarchical multi-agent setups using CoT prompting and RL fine-tuning.

MARS: Margin-Aware Reward-Modeling with Self-Refinement

Proposes difficulty-aware data augmentation for reward models in RLHF/RLAIF to improve alignment without costly human labels.

From single-agent to multi-agent: a comprehensive review of LLM-based legal agents

Reviews legal AI enhancements like syllogism prompts, logic benchmarks, retrieve-then-read, and emotional interaction.

Evaluation and Benchmarking of LLM Agents: A Survey

Highlights unique evaluation needs for LLM-based multi-agent collaboration vs. traditional RL due to absent predefined rewards.

Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation

Analyzes lazy agents in multi-agent LLM frameworks and promotes online RL for balanced contributions.

Multi-Round Human-AI Collaboration with User-Specified Requirements

Defines counterfactual harm and complementarity principles for conversational AI to reliably aid high-stakes human decisions via user-defined rules.

AgentAI: A comprehensive survey on autonomous agents in distributed AI for industry 4.0 - ScienceDirect

Traces AgentAI in gaming from rule-based to adaptive multi-agent systems using RL for emergent behavior.

CLEF HIPE-2026: Evaluating Accurate and Efficient Person-Place Relation Extraction from Multilingual Historical Texts

HIPE-2026 evaluates person-place relation extraction (at/isAt) from noisy multilingual historical texts.

Sink-Aware Pruning for Diffusion Language Models

DLMs suffer high inference costs from iterative denoising, and unlike AR LLMs, their attention-sink positions show high variance across generation, invalidating inherited pruning heuristics.

Agentic AI Systems in Financial Services

Survey of agentic AI in financial services including model risk management and compliance.

Virtual Agent Economies

Explores emergent economic dynamics in virtual environments populated by AI agents.

AgentAI: Autonomous Agents in Distributed AI for Industry 4.0

Comprehensive survey on autonomous AI agents in distributed manufacturing environments.

An Economy of AI Agents

Examines emergent economic behaviors when AI agents interact in market-like environments.

Learning to Execute Graph Algorithms Exactly with GNNs

Demonstrates that graph neural networks can learn to execute classical graph algorithms with exact correctness.

Disentangling Multispecific Antibody Function with GNNs

Applies graph neural networks to characterize multispecific antibody functional properties.

IRL-DAL: Safe Trajectory Planning via Energy-Guided Diffusion Models

Combines inverse reinforcement learning with diffusion models for safe trajectory planning in autonomous driving.

High-quality Dynamic Game Content via Small Language Models

Proves that small language models can generate high-quality dynamic game content.

End-to-end Optimization of Belief and Policy Learning in Shared Autonomy

Jointly optimizes belief estimation and policy learning for shared autonomy where humans and AI collaborate.

MonoScale: Scaling Multi-Agent System with Monotonic Improvement

Proposes a monotonic improvement guarantee for multi-agent scaling that prevents catastrophic performance drops.

Agile Reinforcement Learning through Separable Neural Architecture

Introduces a separable neural architecture enabling agile adaptation in RL without catastrophic forgetting.

TEON: Tensorized Orthonormalization for LLM Pre-Training

Extends the Muon optimizer with tensorized orthonormalization for more efficient large language model pre-training.

Multi-Agent Systems Should be Treated as Principal-Agent Problems

Argues that multi-agent AI systems exhibit principal-agent dynamics requiring economic mechanism design.

AI Agents for Economic Research

NBER working paper exploring how AI agents transform economic research methodology.