Arrow Research search

Author name cluster

Haiyin Piao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

IROS Conference 2025 Conference Paper

Transformer-Based Multi-Agent Reinforcement Learning Method With Credit-Oriented Strategy Differentiation

  • Kaixuan Huang
  • Bo Jin 0001
  • Kun Zhang
  • Haiyin Piao
  • Ziqi Wei 0001

The problem of Multi-Agent Reinforcement Learning (MARL) shows a high level of both complexity in the environment and coordination between agents. In order to scale the algorithm to large-scale agent scenarios, neural networks designed for MARL are typically implemented with parameter sharing. These characteristics result in the challenges of partial observability, credit assignment and strategy homogenization. In this paper, a Transformer-Based Multi-Agent Reinforcement Learning Method With Credit-Oriented Strategy Differentiation (TMRC) is presented to address each of these challenges. First, we design a Temporal-Spatial Encoding module and an Attention-Based Value Decomposition module based on the Transformer architecture. The former leverages both temporal and spatial observation information, compensating for the missing environmental perspectives due to partial observability. The latter is designed to identify each agent’s individual contribution in complex interactions, effectively optimizing the credit assignment process. Then, we propose a Credit-Oriented Strategy Differentiation module that differentiates the entity representations of each agent based on their current task differences, allowing agents to have distinct real-time strategies, effectively mitigating the issue of strategy homogenization. We evaluate the proposed method on the SMAC benchmark. It demonstrates better final performance, faster convergence, and greater stability compared to other comparative methods. Additionally, a series of experiments are conducted to validate the effectiveness of the proposed modules. Our code is available at https://github.com/Hkxuan/TMRC.git.

IROS Conference 2024 Conference Paper

Deep Ad-hoc Sub-Team Partition Learning for Multi-Agent Air Combat Cooperation

  • Songyuan Fan
  • Haiyin Piao
  • Yi Hu
  • Feng Jiang 0001
  • Roushu Yang

In the future, unmanned autonomous air combat will encounter large-scale confrontation scenarios, where agents must consider complex time-varying relationships among aircraft when making decisions. Previous works have already introduced Multi-Agent Reinforcement Learning (MARL) into air combat and succeeded in surpassing the human expert level. However, they mainly focus on small-scale air combat with low relationship complexity, e. g. , 1-vs-1 or 2-vs-2. As more agents join the confrontation, existing algorithms tend to suffer significant performance degradation due to the increase in problem dimensions. In view of this, this paper proposes Deep Ad-hoc Sub-Team Partition Learning(DASPL) to address large-scale air combat problems. DASPL models multi-agent air combat as a graph to handle the complex relations and introduces an automatic partitioning mechanism to generate dynamic sub-teams, which converts the existing large-scale multi-agent air combat cooperation problem into multiple small-scale equivalence problems. Additionally, DASPL incorporates an efficient message passing method among the participating sub-teams.

IROS Conference 2024 Conference Paper

Event-intensity Stereo with Cross-modal Fusion and Contrast

  • Yuanbo Wang
  • Shanglai Qu
  • Tianyu Meng
  • Yan Cui
  • Haiyin Piao
  • Xiaopeng Wei
  • Xin Yang 0011

For binocular stereo, traditional cameras excel in capturing fine details and texture information but are limited in terms of dynamic range and their ability to handle rapid motion. On the contrary, event cameras provide pixel-level intensity changes with low latency and a wide dynamic range, albeit at the cost of less detail in their output. It is natural to leverage the strengths of both modalities. We solve this problem by introducing a cross-modal fusion module that learns a visual representation from both sensor inputs. Additionally, we extract and compare dense event-intensity stereo pair features by contrasting “pairs of event-intensity pairs from different views and different modalities and different timestamps”. This provides the flexibility in masking hard negatives and enables networks to effectively combine event-intensity signals within a contrastive learning framework, leading to an improved matching accuracy and facilitating more accurate estimation of disparity. Experimental results validate the effectiveness of our model and the improvement of disparity estimation accuracy.

AAAI Conference 2024 Conference Paper

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

  • Jinyi Liu
  • Zhi Wang
  • Yan Zheng
  • Jianye Hao
  • Chenjia Bai
  • Junjie Ye
  • Zhen Wang
  • Haiyin Piao

In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty. However, in the presence of environmental stochasticity (noise), purely optimistic exploration may lead to excessive probing of high-noise areas, consequently impeding exploration efficiency. Hence, in exploring noisy environments, while optimism-driven exploration serves as a foundation, prudent attention to alleviating unnecessary over-exploration in high-noise areas becomes beneficial. In this work, we propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control. OVD-Explorer proposes a new measurement of the policy's exploration ability considering noise in optimistic perspectives, and leverages gradient ascent to drive exploration. Practically, OVD-Explorer can be easily integrated with continuous control RL algorithms. Extensive evaluations on the MuJoCo and GridChaos tasks demonstrate the superiority of OVD-Explorer in achieving noise-aware optimistic exploration.

ICRA Conference 2024 Conference Paper

Phasic Diversity Optimization for Population-Based Reinforcement Learning

  • Jingcheng Jiang
  • Haiyin Piao
  • Yu Fu
  • Yihang Hao
  • Chuanlu Jiang
  • Ziqi Wei 0001
  • Xin Yang 0011

Reviewing the previous work of diversity Reinforcement Learning, diversity is often obtained via an augmented loss function, which requires a balance between reward and diversity. Generally, diversity optimization algorithms use Multi-armed Bandits algorithms to select the coefficient in the pre-defined space. However, the dynamic distribution of reward signals for MABs or the conflict between quality and diversity limits the performance of these methods. We introduce the Phasic Diversity Optimization (PDO) algorithm, a Population-Based Training framework that separates reward and diversity training into distinct phases instead of optimizing a multi-objective function. In the auxiliary phase, agents with poor performance diversified via determinants will not replace the better agents in the archive. The decoupling of reward and diversity allows us to use an aggressive diversity optimization in the auxiliary phase without performance degradation. Furthermore, we construct a dogfight scenario for aerial agents to demonstrate the practicality of the PDO algorithm. We introduce two implementations of PDO archive and conduct tests in the newly proposed adversarial dogfight and MuJoCo simulations. The results show that our proposed algorithm achieves better performance than baselines.

AAMAS Conference 2023 Conference Paper

Improving Cooperative Multi-Agent Exploration via Surprise Minimization and Social Influence Maximization

  • Mingyang Sun
  • Yaqing Hou
  • Jie Kang
  • Haiyin Piao
  • Yifeng Zeng
  • Hongwei Ge
  • Qiang Zhang

In multi-agent reinforcement learning (MARL), the uncertainty of state change and the inconsistency between agents’ local observation and global information are always the main obstacles of cooperative multi-agent exploration. To address these challenges, we propose a novel MARL exploration method by combining surprise minimization and social influence maximization. Considering state entropy as a measure of surprise, surprise minimization is achieved by rewarding the individual’s intrinsic motivation (or rewards) for coping with more stable and familiar situations, hence promoting the policy learning. Furthermore, we introduce mutual information between agents’ actions as a regularizer to maximize the social influence via optimizing a tractable variational estimation. In this way, the agents are guided to interact positively with one another by navigating between states that favor cooperation.

AAAI Conference 2023 Conference Paper

The Sufficiency of Off-Policyness and Soft Clipping: PPO Is Still Insufficient according to an Off-Policy Measure

  • Xing Chen
  • Dongcui Diao
  • Hechang Chen
  • Hengshuai Yao
  • Haiyin Piao
  • Zhixiao Sun
  • Zhiwei Yang
  • Randy Goebel

The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space. Does there exist better policies outside of this space? By using a novel surrogate objective that employs the sigmoid function (which provides an interesting way of exploration), we found that the answer is "YES", and the better policies are in fact located very far from the clipped space. We show that PPO is insufficient in "off-policyness", according to an off-policy metric called DEON. Our algorithm explores in a much larger policy space than PPO, and it maximizes the Conservative Policy Iteration (CPI) objective better than PPO during training. To the best of our knowledge, all current PPO methods have the clipping operation and optimize in the clipped policy space. Our method is the first of this kind, which advances the understanding of CPI optimization and policy gradient methods. Code is available at https://github.com/raincchio/P3O.

NeurIPS Conference 2022 Conference Paper

Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning

  • Jifeng Hu
  • Yanchao Sun
  • Hechang Chen
  • Sili Huang
  • Haiyin Piao
  • Yi Chang
  • Lichao Sun

Multi-agent reinforcement learning has drawn increasing attention in practice, e. g. , robotics and automatic driving, as it can explore optimal policies using samples generated by interacting with the environment. However, high reward uncertainty still remains a problem when we want to train a satisfactory model, because obtaining high-quality reward feedback is usually expensive and even infeasible. To handle this issue, previous methods mainly focus on passive reward correction. At the same time, recent active reward estimation methods have proven to be a recipe for reducing the effect of reward uncertainty. In this paper, we propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL). Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training. Specifically, we design the multi-action-branch reward estimation to model reward distributions on all action branches. Then we utilize reward aggregation to obtain stable updating signals during training. Our intuition is that consideration of all possible consequences of actions could be useful for learning policies. The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.

ICAPS Conference 2022 Conference Paper

DOMA: Deep Smooth Trajectory Generation Learning for Real-Time UAV Motion Planning

  • Jin Yu
  • Haiyin Piao
  • Yaqing Hou
  • Li Mo 0001
  • Xin Yang 0011
  • Deyun Zhou

In this paper, we present a Deep Reinforcement Learning (DRL) based real-time smooth UAV motion planning method for solving catastrophic flight trajectory oscillation issues. By formalizing the original problem as a linear mixture of dual-objective optimization, a novel Deep smOoth Motion plAnning (DOMA) algorithm is proposed, which adopts an alternative layer-by-layer gradient descending optimization approach with the major gradient and the DOMA gradient applied separately. Afterward, the mix weight coefficient between the two objectives is also optimized adaptively. Experimental result reveals that the proposed DOMA algorithm outperforms baseline DRL-based UAV motion planning algorithms in terms of both learning efficiency and flight motion smoothness. Furthermore, the UAV safety issue induced by trajectory oscillation is also addressed.

IROS Conference 2021 Conference Paper

A Vision-based Irregular Obstacle Avoidance Framework via Deep Reinforcement Learning

  • Lingping Gao
  • Jianchuan Ding
  • Wenxi Liu
  • Haiyin Piao
  • Yuxin Wang 0001
  • Xin Yang 0011
  • Baocai Yin

Deep reinforcement learning has achieved great success in laser-based collision avoidance work because the laser can sense accurate depth information without too much redundant data, which can maintain the robustness of the algorithm when it is migrated from the simulation environment to the real world. However, high-cost laser devices are not only difficult to apply on a large scale but also have poor robustness to irregular objects, e. g. , tables, chairs, shelves, etc. In this paper, we propose a vision-based collision avoidance framework to solve the challenging problem. Our method attempts to estimate the depth and incorporate the semantic information from RGB data to obtain a new form of data, pseudo-laser data, which combines the advantages of visual information and laser information. Compared to traditional laser data that only contains the one-dimensional distance information captured at a certain height, our proposed pseudo-laser data encodes the depth information and semantic information within the image, which makes our method more effective for irregular obstacles. Besides, we adaptively add noise to the laser data during the training stage to increase the robustness of our model in the real world, due to the estimated depth information is not accurate. Experimental results show that our framework achieves state-of-the-art performance in several unseen virtual and real-world scenarios.

NeurIPS Conference 2021 Conference Paper

Coordinated Proximal Policy Optimization

  • Zifan Wu
  • Chao Yu
  • Deheng Ye
  • Junge Zhang
  • Haiyin Piao
  • Hankz Hankui Zhuo

We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting. The key idea lies in the coordinated adaptation of step size during the policy update process among multiple agents. We prove the monotonicity of policy improvement when optimizing a theoretically-grounded joint objective, and derive a simplified optimization objective based on a set of approximations. We then interpret that such an objective in CoPPO can achieve dynamic credit assignment among agents, thereby alleviating the high variance issue during the concurrent update of agent policies. Finally, we demonstrate that CoPPO outperforms several strong baselines and is competitive with the latest multi-agent PPO method (i. e. MAPPO) under typical multi-agent settings, including cooperative matrix games and the StarCraft II micromanagement tasks.