Arrow Research search

Author name cluster

Qirui Mi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2026 Conference Paper

Think, Speak, Decide: Language-Augmented Multi-Agent Reinforcement Learning for Economic Decision-Making

  • Heyang Ma
  • Qirui Mi
  • Qipeng Yang
  • Zijun Fan
  • Bo Li
  • Haifeng Zhang

Economic decision‑making depends not only on structured signals—such as prices and taxes—but also on unstructured language, including peer dialogue and media narratives. While multi‑agent reinforcement learning (MARL) has shown promise in optimizing economic decisions, it struggles with the semantic ambiguity and contextual richness of language. We propose LAMP (Language‑Augmented Multi‑Agent Policy), the first framework to integrate language into economic decision‑making, narrowing the gap to real‑world settings. LAMP follows a Think–Speak–Decide pipeline: (1) Think interprets numerical observations to extract short‑term shocks and long‑term trends, caching high‑value reasoning trajectories. (2) Speak crafts and exchanges strategic messages based on the reasoning, updating beliefs by parsing peer communications. (3) Decide fuses numerical data, reasoning, and reflections into a MARL policy to optimize language‑augmented decision‑making. Experiments in economic simulation show that LAMP outperforms both MARL and LLM‑only baselines in cumulative return (+63.5%, +34.0%), robustness (+18.8%, +59.4%), and interpretability. These results demonstrate the potential of language‑augmented policies to deliver more effective and robust economic strategies.

NeurIPS Conference 2025 Conference Paper

EconGym: A Scalable AI Testbed with Diverse Economic Tasks

  • Qirui Mi
  • Qipeng Yang
  • Zijun Fan
  • Wentian Fan
  • Heyang Ma
  • Chengdong Ma
  • Siyu Xia
  • Bo An

Artificial intelligence (AI) has become a powerful tool for economic research, enabling large-scale simulation and policy optimization. However, applying AI effectively requires simulation platforms for scalable training and evaluation—yet existing environments remain limited to simplified, narrowly scoped tasks, falling short of capturing complex economic challenges such as demographic shifts, multi-government coordination, and large-scale agent interactions. To address this gap, we introduce EconGym, a scalable and modular testbed that connects diverse economic tasks with AI algorithms. Grounded in rigorous economic modeling, EconGym implements 11 heterogeneous role types (e. g. , households, firms, banks, governments), their interaction mechanisms, and agent models with well-defined observations, actions, and rewards. Users can flexibly compose economic roles with diverse agent algorithms to simulate rich multi-agent trajectories across 25+ economic tasks for AI-driven policy learning and analysis. Experiments show that EconGym supports diverse and cross-domain tasks—such as coordinating fiscal, pension, and monetary policies—and enables benchmarking across AI, economic methods, and hybrids. Results indicate that richer task composition and algorithm diversity expand the policy space, while AI agents guided by classical economic methods perform best in complex settings. EconGym also scales to 100k agents with high realism and efficiency.

ECAI Conference 2025 Conference Paper

Learning Macroeconomic Policies Through Dynamic Stackelberg Mean-Field Games

  • Qirui Mi
  • Zhiyu Zhao
  • Chengdong Ma
  • Siyu Xia
  • Yan Song 0003
  • Mengyue Yang
  • Jun Wang 0012
  • Haifeng Zhang 0002

Macroeconomic outcomes emerge from individuals’ decisions, making it essential to model how agents interact with macro policy via consumption, investment, and labor choices. We formulate this as a dynamic Stackelberg game: the government (leader) sets policies, and agents (followers) respond by optimizing their behavior over time. Unlike static models, this dynamic formulation captures temporal dependencies and strategic feedback critical to policy design. However, as the number of agents increases, explicitly simulating all agent–agent and agent–government interactions becomes computationally infeasible. To address this, we propose the Dynamic Stackelberg Mean Field Game (DSMFG) framework, which approximates these complex interactions via agent–population and government–population couplings. This approximation preserves individual-level feedback while ensuring scalability, enabling DSMFG to jointly model three core features of real-world policy-making: dynamic feedback, asymmetry, and large-scale. We further introduce Stackelberg Mean Field Reinforcement Learning (SMFRL), a data-driven algorithm that learns the leader’s optimal policies while maintaining personalized responses for individual agents. Empirically, we validate our approach in a large-scale simulated economy, where it scales to 1, 000 agents (vs. 100 in prior work) and achieves a 4× GDP gain over classical economic methods and a 19× improvement over the static 2022 U. S. federal income tax policy.

AAMAS Conference 2025 Conference Paper

Mean Field Correlated Imitation Learning

  • Zhiyu Zhao
  • Chengdong Ma
  • Qirui Mi
  • Ning Yang
  • Xue Yan
  • Mengyue Yang
  • Haifeng Zhang
  • Jun Wang

Modeling the behaviors of many-agent games is crucial for capturing the dynamics of large-scale complex systems. This is typically achieved by recovering policies from demonstrations within the Mean Field Game Imitation Learning (MFGIL) framework. However, most MFGIL methods assume that demonstrations are collected from Mean Field Nash Equilibrium (MFNE), implying that agents make decisions independently. When directly applied to situations where agents’ decisions are coordinated, such as publicly routed traffic networks, these techniques often fall short. In this paper, we propose the Adaptive Mean Field Correlated Equilibrium (AMFCE), which introduces a generalized assumption that effectively integrates the correlated behaviors common in real-world systems. We prove the existence of AMFCE under mild conditions and theoretically show that MFNE is a special case of AMFCE. Building upon this, we introduce a new Mean Field Correlated Imitation Learning (MFCIL) algorithm, which recovers expert policy more accurately in scenarios where agents’ decisions are coordinated. We also provide a theoretical upper bound for the error in recovering the expert policy, which is tighter than that of existing methods. Empirical results on real-world traffic flow prediction and large-scale economic simulations demonstrate that MFCIL significantly improves the predictive performance of large populations’ behaviors compared to existing MFGIL baselines. This improvement highlights potential of MFCIL to model real-world multi-agent systems. *Corresponding to Yaodong Yang ⟨yaodong. yang@pku. edu. cn⟩. This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).

NeurIPS Conference 2025 Conference Paper

MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework

  • Qirui Mi
  • Mengyue Yang
  • Xiangning Yu
  • Zhiyu Zhao
  • Cheng Deng
  • Bo An
  • Haifeng Zhang
  • Xu Chen

Simulating collective decision-making involves more than aggregating individual behaviors; it emerges from dynamic interactions among individuals. While large language models (LLMs) offer strong potential for social simulation, achieving quantitative alignment with real-world data remains a key challenge. To bridge this gap, we propose the \textbf{M}ean-\textbf{F}ield \textbf{LLM} (\textbf{MF-LLM}) framework, the first to incorporate mean field theory into LLM-based social simulation. MF-LLM models bidirectional interactions between individuals and the population through an iterative process, generating population signals to guide individual decisions, which in turn update the signals. This interplay produces coherent trajectories of collective behavior. To improve alignment with real-world data, we introduce \textbf{IB-Tune}, a novel fine-tuning method inspired by the \textbf{I}nformation \textbf{B}ottleneck principle, which retains population signals most predictive of future actions while filtering redundant history. Evaluated on a real-world social dataset, MF-LLM reduces KL divergence to human population distributions by \textbf{47\%} compared to non-mean-field baselines, enabling accurate trend forecasting and effective intervention planning. Generalizing across 7 domains and 4 LLM backbones, MF-LLM provides a scalable, high-fidelity foundation for social simulation.

NeurIPS Conference 2024 Conference Paper

Large Language Models Play StarCraft II:Benchmarks and A Chain of Summarization Approach

  • Weiyu Ma
  • Qirui Mi
  • Yongcheng Zeng
  • Xue Yan
  • Yuqiao Wu
  • Runji Lin
  • Haifeng Zhang
  • Jun Wang

With the continued advancement of Large Language Models (LLMs) Agents in reasoning, planning, and decision-making, benchmarks have become crucial in evaluating these skills. However, there is a notable gap in benchmarks for real-time strategic decision-making. StarCraft II (SC2), with its complex and dynamic nature, serves as an ideal setting for such evaluations. To this end, we have developed TextStarCraft II, a specialized environment for assessing LLMs in real-time strategic scenarios within SC2. Addressing the limitations of traditional Chain of Thought (CoT) methods, we introduce the Chain of Summarization (CoS) method, enhancing LLMs' capabilities in rapid and effective decision-making. Our key experiments included: 1. LLM Evaluation: Tested 10 LLMs in TextStarCraft II, most of them defeating LV5 build-in AI, showcasing effective strategy skills. 2. Commercial Model Knowledge: Evaluated four commercial models on SC2 knowledge; GPT-4 ranked highest by Grandmaster-level experts. 3. Human-AI Matches: Experimental results showed that fine-tuned LLMs performed on par with Gold-level players in real-time matches, demonstrating comparable strategic abilities. All code and data from thisstudy have been made pulicly available at https: //github. com/histmeisah/Large-Language-Models-play-StarCraftII

AAMAS Conference 2024 Conference Paper

TaxAI: A Dynamic Economic Simulator and Benchmark for Multi-agent Reinforcement Learning

  • Qirui Mi
  • Siyu Xia
  • Yan Song
  • Haifeng Zhang
  • Shenghao Zhu
  • Jun Wang

Taxation and government spending are crucial tools for governments to promote economic growth and maintain social equity. However, the difficulty in accurately predicting the dynamic strategies of diverse self-interested households presents a challenge for governments to implement effective tax policies. Given its proficiency in modeling other agents in partially observable environments and adaptively learning to find optimal policies, Multi-Agent Reinforcement Learning (MARL) is highly suitable for solving dynamic games between the government and numerous households. Although MARL shows more potential than traditional methods such as the genetic algorithm and dynamic programming, there is a lack of large-scale multi-agent reinforcement learning economic simulators. Therefore, we propose a MARL environment, named TaxAI, for dynamic games involving 𝑁 households, government, firms, and financial intermediaries based on the Bewley-Aiyagari economic model. Our study benchmarks 2 traditional economic methods with 7 MARL methods on TaxAI, demonstrating the effectiveness and superiority of MARL algorithms. Moreover, TaxAI’s scalability in simulating dynamic interactions between the government and 10, 000 households, coupled with real-data calibration, grants it a substantial improvement in scale and reality over existing simulators. Therefore, TaxAI is the most realistic economic simulator for optimal tax policy, which aims to generate feasible recommendations for governments and individuals.