Arrow Research search

Author name cluster

Yang Gao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

108 papers
2 author rows

Possible papers

108

AAAI Conference 2026 Conference Paper

Causality-Aware Efficient Exploration for Cooperative Multi-Agent Reinforcement Learning

  • Hongye Cao
  • Tianpei Yang
  • Fan Feng
  • Hammadi Rafik Ouariachi
  • Yali Du
  • Meng Fang
  • Jing Huo
  • Yang Gao

Exploration is critical for cooperative multi agent reinforcement learning (MARL) to improve sample efficiency. However, existing intrinsic motivation based exploration strategies in MARL overlook the causal relationships among agents, global states, and rewards, suffering from interference by irrelevant factors and resulting in sample inefficiency. To address this issue, we propose Causality aware Efficient Exploration (CEE), a novel framework that enhances sample efficiency by inferring causal relationships between agents, global states with respect to rewards, thereby enabling causality guided exploration. Specifically, CEE operates through two components. First, CEE identifies causal relationships between global states and rewards, filtering out causally irrelevant state features that do not have a high impact on rewards to keep decision critical state information. Second, CEE discovers causal relationships between agents' behaviors and rewards to quantify each agent's contribution to collective performance. To achieve this, we introduce a causal entropy objective that promotes exploration aligned with decision critical aspects of the underlying causal structure. We provide comprehensive validation through experiments on 21 challenging tasks spanning SMAC, SMAC v2, and Google Research Football (GRF) environments. Our results demonstrate that CEE achieves superior performance in terms of sample efficiency and asymptotic performance compared to existing MARL methods.

EAAI Journal 2026 Journal Article

Cooperative decision-making of unmanned aerial vehicles: A multi-agent reinforcement learning approach

  • Ziyi Wang
  • Guoliang Ma
  • Jian Guo
  • Chen Qian
  • Yang Gao
  • Zhuo Huang

Cooperative decision-making of unmanned aerial vehicles (UAVs) for military missions is a crucial research topic. However, the ability constraints of heterogeneous UAVs in real-world scenarios bring significant challenges to the cooperative decision-making process. To address these issues, this paper proposes a multi-agent proximal policy optimization (MAPPO) algorithm with a flexible observation feature encoding (FOFE) mechanism and a Mamba-based memory structure. Firstly, the cooperative decision-making problem for reconnaissance-strike integrated fixed-wing UAV (RSUAV) swarms is formulated as a distributed partially observable Markov decision process (Dec-POMDP). Secondly, to address the variability and incompleteness in observation inputs, an FOFE strategy is introduced. This allows the network to process multi-channel and variable-length data effectively. Furthermore, the Mamba model is incorporated to capture temporal dependencies in historical observations. This enhances decision-making in prolonged missions. Under this multi-agent reinforcement learning (MARL) framework, each RSUAV can make autonomous decisions in a decentralized manner. The simulation results show that the proposed algorithm improves the completion ratio ( > 10. 1%), survival ratio ( > 14. 8%), and reduces completion time ( > 21. 5%) compared to baselines. It also exhibits strong generalization capability and holds practical feasibility for deployment on edge computing devices. Therefore, this approach enables effective cooperative decision-making under the ability constraints of heterogeneous RSUAVs.

AAAI Conference 2026 Conference Paper

Faster Game Solving via Asymmetry of Step Sizes

  • Linjian Meng
  • Tianpei Yang
  • Youzhi Zhang
  • Zhenxing Ge
  • Yang Gao

Counterfactual Regret Minimization (CFR) algorithms are widely used to compute a Nash equilibrium (NE) in two-player zero-sum imperfect-information extensive-form games (IIGs). Among them, Predictive CFR+ (PCFR+) is particularly powerful, achieving an exceptionally fast empirical convergence rate via the prediction in many games. However, the empirical convergence rate of PCFR+ would significantly degrade if the prediction is inaccurate, leading to unstable performance on certain IIGs. To enhance the robustness of PCFR+, we propose Asymmetric PCFR+ (APCFR+), which employs an adaptive asymmetry of step sizes between the updates of implicit and explicit accumulated counterfactual regrets to mitigate the impact of the prediction inaccuracy on convergence. We present a theoretical analysis demonstrating why APCFR+ can enhance the robustness. To the best of our knowledge, we are the first to propose the asymmetry of step sizes, a simple yet novel technique that effectively improves the robustness of PCFR+. Then, to reduce the difficulty of implementing APCFR+ caused by the adaptive asymmetry, we propose a simplified version of APCFR+ called Simple APCFR+ (SAPCFR+), which uses a fixed asymmetry of step sizes to enable only a single-line modification compared to original PCFR+. Experimental results on five standard IIG benchmarks and two heads-up no-limit Texas Hold’em (HUNL) Subagems show that (i) both APCFR+ and SAPCFR+ outperform PCFR+ in most of the tested games, (ii) SAPCFR+ achieves a comparable empirical convergence rate with APCFR+, and (iii) our approach can be generalized to improve other CFR algorithms, e.g., Discount CFR (DCFR).

AAAI Conference 2026 Conference Paper

Identifying and Analyzing Performance-Critical Tokens in Large Language Models

  • Yu Bai
  • Heyan Huang
  • Cesare Spinoso-Di Piano
  • Sanxing Chen
  • Marc-Antoine Rondeau
  • Yang Gao
  • Jackie Chi Kit Cheung

In-context learning (ICL) has emerged as an effective solution for few-shot learning with large language models (LLMs). However, how LLMs leverage demonstrations to specify a task and learn a corresponding computational function through ICL is underexplored. Drawing from the way humans learn from content-label mappings in demonstrations, we categorize the tokens in an ICL prompt into content, stopword, and template tokens. Our goal is to identify the types of tokens whose representations directly influence LLM's performance, a property we refer to as being performance-critical. By ablating representations from the attention of the test example, we find that the representations of informative content tokens have less influence on performance compared to template and stopword tokens, which contrasts with the human attention to informative words. We give evidence that the representations of performance-critical tokens aggregate information from the content tokens. Moreover, we demonstrate experimentally that lexical meaning, repetition, and structural cues are the main distinguishing characteristics of these tokens. Our work sheds light on how LLMs learn to perform tasks from demonstrations and deepens our understanding of the roles different types of tokens play in LLMs.

AAAI Conference 2026 Conference Paper

ManiLong-Shot: Interaction-Aware One-Shot Imitation Learning for Long-Horizon Manipulation

  • Zixuan Chen
  • Chongkai Gao
  • Lin Shao
  • Jieqi Shi
  • Jing Huo
  • Yang Gao

One-shot imitation learning (OSIL) offers a promising way to teach robots new skills without large-scale data collection. However, current OSIL methods are primarily limited to short-horizon tasks, thus limiting their applicability to complex, long-horizon manipulations. To address this limitation, we propose ManiLong-Shot, a novel framework that enables effective OSIL for long-horizon prehensile manipulation tasks. ManiLong-Shot structures long-horizon tasks around physical interaction events, reframing the problem as sequencing interaction-aware primitives instead of directly imitating continuous trajectories. This primitive decomposition can be driven by high-level reasoning from a vision-language model (VLM) or by rule-based heuristics derived from robot state changes. For each primitive, ManiLong-Shot predicts invariant regions critical to the interaction, establishes correspondences between the demonstration and the current observation, and computes the target end-effector pose, enabling effective task execution. Extensive simulation experiments show that ManiLong-Shot, trained on only 10 short-horizon tasks, generalizes to 20 unseen long-horizon tasks across three difficulty levels via one-shot imitation, achieving a 22.8% relative improvement over the SOTA. Additionally, real-robot experiments validate ManiLong-Shot’s ability to robustly execute three long-horizon manipulation tasks via OSIL, confirming its practical applicability.

AAAI Conference 2026 Conference Paper

Simulated Rewards, Skewed Strategies: Tracing the Acquired Preference Bias in LLM-Based Dialogue Planners

  • Heyan Huang
  • Yizhe Yang
  • Huashan Sun
  • Jiawei Li
  • Yang Gao

Large language models have enabled sophisticated dialogue planning policy, but their reliance on LLM-generated simulation and feedback for policy optimization may introduce systematic preference bias. We present the first comprehensive analysis of preference bias in LLM-based dialogue planners, evaluating four state-of-the-art planning policies across three dialogue domains using multiple LLM families at varying scales. Our investigation reveals that all tested planners exhibit significant preference bias, systematically favoring narrow strategy sets rather than maintaining balanced distributions. User simulation emerges as the primary bias driver, while diverse persona simulation fails as an effective mitigation strategy. Most concerning, preference bias drives planners toward ethically problematic strategies that achieve short-term success while undermining real-world effectiveness and ethical standards. Our findings establish fundamental challenges for responsible deployment of LLM-based dialogue systems and provide crucial insights for developing more reliable and ethically-aligned planning approaches.

YNIMG Journal 2026 Journal Article

VSSI-TBM: A variational sparse source imaging method based on time basis matrix

  • Tianyu Gao
  • Jin Ding
  • Wen Li
  • Fulong Wang
  • Yujie Ma
  • Ruonan Wang
  • Yang Gao
  • Xiaolin Ning

Source imaging algorithms have been widely used to localize functional and lesion areas. Brain source reconstruction is limited by complex experimental environments (noise interference, distributed brain activity, acquisition systems, etc.), and range estimation is not accurate. This study proposes a variational sparse source imaging method based on the time basis matrix (VSSI-TBM) algorithm. VSSI-TBM permits the source spatial signal to consist of several temporal basis functions by using low-rank decomposition to extract effective signals. In a compressed space, mixed-norm constraints and a cortical source variation operator ensure spatial sparsity and smoothness. In clinical examinations or research, other a priori information regarding brain activity may be available. VSSI-TBM using lead field guide constraints can further enhance the reconstruction results. The simulation results demonstrate the robust performance of VSSI-TBM in environments with a low signal-to-noise ratio (SNR), large sources ( > 11 cm 2 ), and multiple sources. Additionally, integrating prior information enhances the imaging performance in complex environments. The algorithm is evaluated using an open-source dataset and an optically pumped magnetometer-based magnetoencephalography (OPM-MEG) system with a noisy 30-channel uniform layout. The results reveal a strong robustness of the spatial range reconstruction. Moreover, the combination of prior information effectively improves the imaging performance of the OPM-MEG system.

NeurIPS Conference 2025 Conference Paper

Association-Focused Path Aggregation for Graph Fraud Detection

  • Tian Qiu
  • Wenda Li
  • Zunlei Feng
  • Jie Lei
  • Tao Wang
  • Yi Gao
  • Mingli Song
  • Yang Gao

Fraudulent activities have caused substantial negative social impacts and are exhibiting emerging characteristics such as intelligence and industrialization, posing challenges of high-order interactions, intricate dependencies, and the sparse yet concealed nature of fraudulent entities. Existing graph fraud detectors are limited by their narrow "receptive fields", as they focus only on the relations between an entity and its neighbors while neglecting longer-range structural associations hidden between entities. To address this issue, we propose a novel fraud detector based on Graph Path Aggregation (GPA). It operates through variable-length path sampling, semantic-associated path encoding, path interaction and aggregation, and aggregation-enhanced fraud detection. To further facilitate interpretable association analysis, we synthesize G-Internet, the first benchmark dataset in the field of internet fraud detection. Extensive experiments across datasets in multiple fraud scenarios demonstrate that the proposed GPA outperforms mainstream fraud detectors by up to +15% in Average Precision (AP). Additionally, GPA exhibits enhanced robustness to noisy labels and provides excellent interpretability by uncovering implicit fraudulent patterns across broader contexts. Code is available at https: //github. com/horrible-dong/GPA.

AAAI Conference 2025 Conference Paper

Beyond Mandatory Federations: Balancing Egoism, Utilitarianism and Egalitarianism in Mixed-Motive Games

  • Shaokang Dong
  • Chao Li
  • Shangdong Yang
  • Hongye Cao
  • Wanqi Yang
  • Yang Gao

In the field of mixed-motive games, extensive multi-agent learning studies have explored the balance between egoism (individual interest), utilitarianism (collective interest), and egalitarianism (fairness). Traditional approaches often rely on manually designed reward functions, social norms, and alliance/federation mechanisms to transition agents from individualistic behaviors toward cooperative strategies. However, these methods typically require all agents to share private local information or to mandatorily participate in federations, which is impractical in real-world applications. To address these issues, this paper proposes a Flexible-Participation Federation (FPF) framework that allows agents to participate in the federation voluntarily. Furthermore, we extend the federation from a global to a Local Multi-Federation (LMF) framework, enabling agents to form multiple localized federations, thereby promoting more efficient and adaptive cooperation. Theoretical evidence demonstrates that the global FPF model, along with the discrepancy between decentralized egoistic policies and federated utilitarian policies, achieves an O(1/T) convergence rate. Agents in the LMF framework also reach consensus within a sublinear gap. Extensive experiments show that agents opting out of federation participation experience a reduction in egoism, and our approach outperforms multiple baselines in terms of both utilitarianism and egalitarianism.

NeurIPS Conference 2025 Conference Paper

Combinatorial Ski Rental Problem: Robust and Learning-Augmented Algorithms

  • Ziwei Li
  • Bo Sun
  • Zhiqiu Zhang
  • Mohammad Hajiesmaili
  • Binghan Wu
  • Lin Yang
  • Yang Gao

We introduce and study the Combinatorial Ski Rental (CSR) problem, which involves multiple items that can be rented or purchased, either individually or in combination. At each time step, a decision-maker must make an irrevocable buy-or-rent decision for items that have not yet been purchased, without knowing the end of the time horizon. We propose a randomized online algorithm, Sorted Optimal Amortized Cost (SOAC), that achieves the optimal competitive ratio. Moreover, SOAC can be extended to address various well-known ski rental variants, including the multi-slope, multi-shop, multi-commodity ski rental and CSR with upgrading problems. Building on the proposed SOAC algorithm, we further develop a learning-augmented algorithm that leverages machine-learned predictions to improve the performance of CSR. This algorithm is capable of recovering or improving upon existing results of learning-augmented algorithms in both the classic ski rental and multi-shop ski rental problems. Experimental results validate our theoretical analysis and demonstrate the advantages of our algorithms over baseline methods for ski rental problems.

NeurIPS Conference 2025 Conference Paper

DON’T NEED RETRAINING: A Mixture of DETR and Vision Foundation Models for Cross-Domain Few-Shot Object Detection

  • Changhan Liu
  • Xunzhi Xiang
  • Zixuan Duan
  • Wenbin Li
  • Qi Fan
  • Yang Gao

Cross-Domain Few-Shot Object Detection (CD-FSOD) aims to generalize to unseen domains by leveraging a few annotated samples of the target domain, requiring models to exhibit both strong generalization and localization capabilities. However, existing well-trained detectors typically have strong localization capabilities but lack generalization, whereas vision foundation models (VFMs) generally exhibit better generalization but lack accurate localization capabilities. In this paper, we propose a novel Mixture-of-Experts (MoE) structure that integrates the detector's localization capability and the VFM's generalization by using VFM features to improve detector features. Specifically, we propose Expert-wise Router (ER) that selects the most relevant VFM experts for each backbone layer, and Region-wise Router (RR) that emphasizes foreground and suppress background. To bridge representation gaps, we further propose Shared Expert Projection (SEP) module and Private Expert Projection (PEP) module, which align VFM features to the detector feature space while decoupling shared image feature from private image feature in the VFM feature map. Finally, we propose MoE module to transfer the VFM’s generalization to the detector without altering the detector original architecture. Furthermore, our method extend well-trained detectors for detecting novel classes in unseen domains without re-training on the base classes. Experimental results on multiple cross-domain datasets validate the effectiveness of our method.

NeurIPS Conference 2025 Conference Paper

Efficient Last-Iterate Convergence in Solving Extensive-Form Games

  • Linjian Meng
  • Tianpei Yang
  • Youzhi Zhang
  • Zhenxing Ge
  • Shangdong Yang
  • Tianyu Ding
  • Wenbin Li
  • Bo An

To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Hence, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, these studies only establish last-iterate convergence for Online Mirror Descent (OMD)-based CFR algorithms instead of Regret Matching (RM)-based CFR algorithms in solving perturbed regularized EFGs, resulting in a poor empirical convergence rate, as RM-based CFR algorithms typically outperform OMD-based CFR algorithms. In addition, as solving multiple perturbed regularized EFGs is required, fine-tuning across multiple perturbed regularized EFGs is infeasible, making parameter-free algorithms highly desirable. This paper show that CFR$^+$, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. This is the first parameter-free last-iterate convergence for RM-based CFR algorithms in perturbed regularized EFGs. Leveraging CFR$^+$ to solve perturbed regularized EFGs, we get Reward Transformation CFR$^+$ (RTCFR$^+$). Importantly, we extend prior work on the parameter-free property of CFR$^+$, enhancing its stability, which is vital for the empirical convergence of RTCFR$^+$. Experiments show that RTCFR$^+$ exhibits a significantly faster empirical convergence rate than existing algorithms that achieve theoretical last-iterate convergence. Interestingly, RTCFR$^+$ show performance no worse than average-iterate convergence CFR algorithms. It is the first last-iterate convergence algorithm to achieve such performance. Our code is available at https: //github. com/menglinjian/NeurIPS-2025-RTCFR.

NeurIPS Conference 2025 Conference Paper

Federated Multi-armed Bandits with Efficient Bit-Level Communications

  • Haoran Zhang
  • Yang Xu
  • Xuchuang Wang
  • Hao-Xu Chen
  • Hao Qiu
  • Lin Yang
  • Yang Gao

In this work, we study the federated multi-armed bandit (FMAB) problem, where a set of distributed agents collaboratively aim to minimize cumulative regret while interacting with a shared set of arms. Unlike traditional centralized bandit models, agents in FMAB settings are connected via a communication graph and cannot share data freely due to bandwidth limitations or privacy constraints. This raises a fundamental challenge: how to achieve optimal learning performance under stringent communication budgets. We propose a novel communication-efficient algorithm that decouples the learning process into two phases: one for eliminating suboptimal arms through early and frequent communication of key decisions, and another for refining global estimates using buffered, quantized, and differentially transmitted statistics. By carefully balancing the communication frequency and precision of shared information, our algorithm achieves the optimal individual regret bound $O(N^{-1}\log T)$ while significantly reducing the total number of communication rounds and transmitted bits. Theoretically, we derive tight upper bounds on both individual cumulative regret and group regret, and prove that our method asymptotically matches the lower bound of regret in federated settings. Experimental results on synthetic data validate the effectiveness of the proposed approach in various graph topologies and under heterogeneous feedback.

NeurIPS Conference 2025 Conference Paper

In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning

  • Chao Li
  • Bingkun BAO
  • Yang Gao

In this paper, we consider fully decentralized cooperative multi-agent reinforcement learning, where each agent has access only to the states, its local actions, and the shared rewards. The absence of information about other agents' actions typically leads to the non-stationarity problem during per-agent value function updates, and the relative overgeneralization issue during value function estimation. However, existing works fail to address both issues simultaneously, as they lack the capability to model the agents' joint policy in a fully decentralized setting. To overcome this limitation, we propose a simple yet effective method named Return-Aware Context (RAC). RAC formalizes the dynamically changing task, as locally perceived by each agent, as a contextual Markov Decision Process (MDP), and addresses both non-stationarity and relative overgeneralization through return-aware context modeling. Specifically, the contextual MDP attributes the non-stationary local dynamics of each agent to switches between contexts, each corresponding to a distinct joint policy. Then, based on the assumption that the joint policy changes only between episodes, RAC distinguishes different joint policies by the training episodic return and constructs contexts using discretized episodic return values. Accordingly, RAC learns a context-based value function for each agent to address the non-stationarity issue during value function updates. For value function estimation, an individual optimistic marginal value is constructed to encourage the selection of optimal joint actions, thereby mitigating the relative overgeneralization problem. Experimentally, we evaluate RAC on various cooperative tasks (including matrix game, predator and prey, and SMAC), and its significant performance validates its effectiveness.

NeurIPS Conference 2025 Conference Paper

Joint Modeling of fMRI and EEG Imaging Using Ordinary Differential Equation-Based Hypergraph Neural Networks

  • Yan Zhang
  • Yang Gao
  • Min Li

Fusing multimodal brain imaging has been a hot topic since different modalities of brain imaging can provide complementary information. However, due to the size of simultaneous recorded fMRI-EEG dataset being limited and the substantial discrepancy between hemodynamic responses of fMRI and neural oscillations of EEG, the joint modeling of fMRI and EEG images is a rarely explored area and has not yielded satisfactory results. Existing studies have also indicated that the relationships between region of interest (ROI) are not one-to-one when synchronizing fMRI and EEG. Current graph-based multimodal modeling methods overlook those information. Based on this, we propose a hypergraph based fMRI-EEG modeling framework for asynchronous fMRI-EEG data named FE-NET. To the best of our knowledge, this is the first attempt to jointly model asynchronous EEG and fMRI data as Neural ODEs based hypergraph. Extensive experiments have demonstrated that the proposed FE-NET outperforms many state-of-the-art brain imaging modeling methods. Meanwhile, compared to simultaneously recorded fMRI-EEG data, asynchronously acquired fMRI-EEG data is less costly, which demonstrates the practical applicability of our method.

AAAI Conference 2025 Conference Paper

Large Language Models Enhanced Personalized Graph Neural Architecture Search in Federated Learning

  • Hui Fang
  • Yang Gao
  • Peng Zhang
  • Jiangchao Yao
  • Hongyang Chen
  • Haishuai Wang

Personalized federated learning (PFL) on graphs is an emerging field focusing on the collaborative development of architectures across multiple clients, each with distinct graph data distributions while adhering to strict privacy standards. This area often requires extensive expert intervention in model design, which is a significant limitation. Recent advancements have aimed to automate the search for graph neural network architectures, incorporating large language models (LLMs) for their advanced reasoning and self-reflection capabilities. However, two technical challenges persist. First, although LLMs are effective in natural language processing, their ability to meet the complex demands of graph neural architecture search (GNAS) is still being explored. Second, while LLMs can guide the architecture search process, they do not directly solve the issue of client drift due to heterogeneous data distributions. To address these challenges, we introduce a novel method, Personalized Federated Graph Neural Architecture Search (PFGNAS). This approach employs a task-specific prompt to identify and integrate optimal GNN architectures continuously. To counteract client drift, PFGNAS utilizes a weight-sharing strategy of supernet, which optimizes the local architectures while ensuring client-specific personalization. Extensive evaluations show that PFGNAS significantly outperforms traditional PFL methods, highlighting the advantages of integrating LLMs into personalized federated learning environments.

NeurIPS Conference 2025 Conference Paper

Last-Iterate Convergence of Smooth Regret Matching$^+$ Variants in Learning Nash Equilibria

  • Linjian Meng
  • Youzhi Zhang
  • Zhenxing Ge
  • Tianyu Ding
  • Shangdong Yang
  • Zheng Xu
  • Wenbin Li
  • Yang Gao

Regret Matching$^+$ (RM$^+$) variants are widely used to build superhuman Poker AIs, yet few studies investigate their last-iterate convergence in learning a Nash equilibrium (NE). Although their last-iterate convergence is established for games satisfying the Minty Variational Inequality (MVI), no studies have demonstrated that these algorithms achieve such convergence in the broader class of games satisfying the weak MVI. A key challenge in proving last-iterate convergence for RM$^+$ variants in games satisfying the weak MVI is that even if the game's loss gradient satisfies the weak MVI, RM$^+$ variants operate on a transformed loss feedback which does not satisfy the weak MVI. To provide last-iterate convergence for RM$^+$ variants, we introduce a concise yet novel proof paradigm that involves: (i) transforming an RM$^+$ variant into an Online Mirror Descent (OMD) instance that updates within the original strategy space of the game to recover the weak MVI, and (ii) showing last-iterate convergence by proving the distance between accumulated regrets converges to zero via the recovered weak MVI of the feedback. Inspired by our proof paradigm, we propose Smooth Optimistic Gradient Based RM$^+$ (SOGRM$^+$) and show that it achieves last-iterate and finite-time best-iterate convergence in learning an NE of games satisfying the weak MVI, the weakest condition among all known RM$^+$ variants. Experiments show that SOGRM$^+$ significantly outperforms other algorithms. Our code is available at https: //github. com/menglinjian/NeurIPS-2025-SOGRM.

NeurIPS Conference 2025 Conference Paper

Multi-Agent Reinforcement Learning with Communication-Constrained Priors

  • Guang Yang
  • Tianpei Yang
  • Jingwen Qiao
  • Yanqing Wu
  • Jing Huo
  • Xingguo Chen
  • Yang Gao

Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent issue. Existing multi-agent reinforcement learning with communication, due to their limited scalability and robustness, struggles to apply to complex and dynamic real-world environments. To address these challenges, we propose a generalized communication-constrained model to uniformly characterize communication conditions across different scenarios. Based on this, we utilize it as a learning prior to distinguish between lossy and lossless messages for specific scenarios. Additionally, we decouple the impact of lossy and lossless messages on distributed decision-making, drawing on a dual mutual information estimatior, and introduce a communication-constrained multi-agent reinforcement learning framework, quantifying the impact of communication messages into the global reward. Finally, we validate the effectiveness of our approach across several communication-constrained benchmarks.

UAI Conference 2025 Conference Paper

Near-Optimal Regret Bounds for Federated Multi-armed Bandits with Fully Distributed Communication

  • Haoran Zhang
  • Xuchuang Wang
  • Hao-Xu Chen
  • Hao Qiu
  • Lin Yang 0013
  • Yang Gao

In this paper, we focus on the research of federated multi-armed bandit (FMAB) problems where agents can only communicate with their neighbors. All agents aim to solve a common multi-armed bandit (MAB) problem to minimize individual regrets, while group regret can also be minimized. In a federated bandit problem, an agent fails to estimate the global reward means of arms by only using local observations, and hence, the bandit learning algorithm usually adopts a consensus estimation strategy to address the heterogeneity. However, up to now, the existing algorithms with fully distributed communication graphs only achieved a suboptimal result for the problem. To address that, a fully distributed online consensus estimation algorithm (\texttt{CES}) is proposed to estimate the global mean without bias. Integrating this consensus estimator into a distributed successive elimination bandit algorithm framework yields our federated bandit algorithm. Our algorithm significantly improves both individual and group regrets over previous approaches, and we provide an in-depth analysis of the lower bound for this problem.

YNIMG Journal 2025 Journal Article

Repairbads: An automatic and adaptive method to repair bad channels and segments for OPM-MEG

  • Fulong Wang
  • Yujie Ma
  • Tianyu Gao
  • Yue Tao
  • Ruonan Wang
  • Ruochen Zhao
  • Fuzhi Cao
  • Yang Gao

The optically pumped magnetometer (OPM) based magnetoencephalography (MEG) system offers advantages such as flexible layout and wearability. However, the position instability or jitter of OPM sensors can result in bad channels and segments, which significantly impede subsequent preprocessing and analysis. Most common methods directly reject or interpolate to repair these bad channels and segments. Direct rejection leads to data loss, and when the number of sensors is limited, interpolation using neighboring sensors can cause significant signal distortion and cannot repair bad segments present in all channels. Therefore, most existing methods are unsuitable for OPM-MEG systems with fewer channels. We introduce an automatic bad segments and bad channels repair method for OPM-MEG, called Repairbads. This method aims to repair all bad data and reduce signal distortion, especially capable of automatically repairing bad segments present in all channels simultaneously. Repairbads employs Riemannian Potato combined with joint decorrelation to project out artifact components, achieving automatic bad segment repair. Then, an adaptive algorithm is used to segment the signal into relatively stable noise data chunks, and the source-estimate-utilizing noise-discarding algorithm is applied to each chunk to achieve automatic bad channel repair. We compared the performance of Repairbads with the Autoreject method on both simulated and real auditory evoked data, using five evaluation metrics for quantitative assessment. The results demonstrate that Repairbads consistently outperforms across all five metrics. In both simulated and real OPM-MEG data, Repairbads shows better performance than current state-of-the-art methods, reliably repairing bad data with minimal distortion. The automation of this method significantly reduces the burden of manual inspection, promoting the automated processing and clinical application of OPM-MEG.

ICLR Conference 2025 Conference Paper

RRM: Robust Reward Model Training Mitigates Reward Hacking

  • Tianqi Liu 0002
  • Wei Xiong 0015
  • Jie Ren 0006
  • Lichang Chen
  • Junru Wu
  • Rishabh Joshi
  • Yang Gao
  • Jiaming Shen

Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human preferences. However, traditional RM training, which relies on response pairs tied to specific prompts, struggles to disentangle prompt-driven preferences from prompt-independent artifacts, such as response length and format. In this work, we expose a fundamental limitation of current RM training methods, where RMs fail to effectively distinguish between contextual signals and irrelevant artifacts when determining preferences. To address this, we introduce a causal framework that learns preferences independent of these artifacts and propose a novel data augmentation technique designed to eliminate them. Extensive experiments show that our approach successfully filters out undesirable artifacts, yielding a more robust reward model (RRM). Our RRM improves the performance of a pairwise reward model trained on Gemma-2-9b-it, on Reward-Bench, increasing accuracy from 80.61% to 84.15%. Additionally, we train two DPO policies using both the RM and RRM, demonstrating that the RRM significantly enhances DPO-aligned policies, improving MT-Bench scores from 7.27 to 8.31 and length-controlled win-rates in AlpacaEval-2 from 33.46% to 52.49%.

IJCAI Conference 2025 Conference Paper

Sharpness-aware Zeroth-order Optimization for Graph Transformers

  • Yang Liu
  • Chuan Zhou
  • Yuhan Lin
  • Shuai Zhang
  • Yang Gao
  • Zhao Li
  • Shirui Pan

Graph Transformers (GTs) have emerged as powerful tools for handling graph-structured data through global attention mechanisms. While GTs can effectively capture long-range dependencies, they introduce difficulties in optimization due to their complex, non-differentiable operators, which cannot be directly handled by standard gradient-based optimizers (such as Adam or AdamW). To investigate the above issues, this work adopts the line of Zeroth-Order Optimization (ZOO) technique. However, direct integration of ZOO incurs considerable challenges due to the sharp loss landscape and steep gradients within the GT parameter space. Under the above observations, we propose a Sharpness-aware Zeroth-order Optimizer (SZO) that combines Sharpness-Aware Minimization (SAM) technique facilitating convergence within a flatter neighborhood, and leverages parallel computing for efficient gradient estimation. Theoretically, we provide a comprehensive analysis of the optimizer from both convergence and generalization perspectives. Empirically, we conduct extensive experiments on various classical GTs across a wide range of benchmark datasets, which underscore the superior performance of SZO over the state-of-the-art optimizers.

IJCAI Conference 2025 Conference Paper

SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation

  • Bin Xu
  • Yiguan Lin
  • Yinghao Li
  • Yang Gao

Large language models exhibit remarkable performance in simple code generation tasks. However, they encounter significant challenges when addressing complex problems that require reasoning and question decomposition. To tackle this, we propose a self-driven reasoning augmentation process, SRA-MCTS, which incorporates Monte Carlo Tree Search (MCTS) for reasoning data generation. SRA-MCTS enables LLMs to self-generate intermediate reasoning steps and perform iterative self-evaluation, facilitating self-improvement. Specifically, it utilizes MCTS to produce diverse intermediate reasoning steps. During each iteration, MCTS generates a step and employs self-evaluation to guide the selection of subsequent branches, ultimately forming a sufficiently diverse reasoning path referred to as “thinking”. This thinking guides the model in generating corresponding code, and both are combined as training data for supervised fine-tuning. Experimental results demonstrate that SRA-MCTS achieves consistent performance improvements across three model scales without additional supervisory assistance. Applied to the Meta-Llama-3. 1-8B-Instruct model, it delivers an 11-point improvement on the MBPP-Complex dataset, underscoring the significant potential for model self-improvement. The code and data are available at https: //github. com/DIRECT-BIT/SRA-MCTS.

NeurIPS Conference 2025 Conference Paper

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

  • Xeron Du
  • Yifan Yao
  • Kaijing Ma
  • Bingli Wang
  • Tianyu Zheng
  • Minghao Liu
  • Yiming Liang
  • Xiaolong Jin

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e. g. , the reasoning-focused model Gemini-2. 5-Pro achieved the highest accuracy of 63. 56% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

EAAI Journal 2024 Journal Article

An adaptive bidirectional quick optimal Rapidly-exploring Random Tree algorithm for path planning

  • Zhuo Huang
  • Yang Gao
  • Jian Guo
  • Chen Qian
  • Qingwei Chen

In recent decades, sampling-based algorithms have played an important role in path planning. As an extended algorithm of Rapidly-exploring Random Tree(RRT), the optimal RRT(RRT*) has advantages in asymptotic optimality. However, it still suffers from a slow convergence rate and expensive initial solution heavily. Besides, it cannot provide an effective path planning solution when dealing with narrow passages. To overcome this problem, an adaptive bidirectional quick RRT*(ABQ-RRT*) method is proposed for Unmanned Aerial Vehicle(UAV) in narrow passages. Firstly, the cost function has been redefined by taking into account both the Euclidean distance and the turning angle of the UAV. Secondly, the number of failures in the collision detection is considered and an adaptive goal-biased sampling strategy is adopted to obtain higher-quality sampling points. Thirdly, by local sampling near the obstacles, the type of the local environment is quickly judged. Also, the repulsion field is superimposed to effectively avoid obstacles. Then, during the process of selecting the optimal parent node from the set of candidate nodes, the sorted possible parents consider both the neighborhood nodes and their ancestors to obtain better paths. At last, the effectiveness of ABQ-RRT* is confirmed by comparing it with several other improved RRT algorithms through simulations.

AAAI Conference 2024 Conference Paper

Angle Robustness Unmanned Aerial Vehicle Navigation in GNSS-Denied Scenarios

  • Yuxin Wang
  • Zunlei Feng
  • Haofei Zhang
  • Yang Gao
  • Jie Lei
  • Li Sun
  • Mingli Song

Due to the inability to receive signals from the Global Navigation Satellite System (GNSS) in extreme conditions, achieving accurate and robust navigation for Unmanned Aerial Vehicles (UAVs) is a challenging task. Recently emerged, vision-based navigation has been a promising and feasible alternative to GNSS-based navigation. However, existing vision-based techniques are inadequate in addressing flight deviation caused by environmental disturbances and inaccurate position predictions in practical settings. In this paper, we present a novel angle robustness navigation paradigm to deal with flight deviation in point-to-point navigation tasks. Additionally, we propose a model that includes the Adaptive Feature Enhance Module, Cross-knowledge Attention-guided Module and Robust Task-oriented Head Module to accurately predict direction angles for high-precision navigation. To evaluate the vision-based navigation methods, we collect a new dataset termed as UAV_AR368. Furthermore, we design the Simulation Flight Testing Instrument (SFTI) using Google Earth to simulate different flight environments, thereby reducing the expenses associated with real flight testing. Experiment results demonstrate that the proposed model outperforms the state-of-the-art by achieving improvements of 26.0% and 45.6% in the success rate of arrival under ideal and disturbed circumstances, respectively.

AAMAS Conference 2024 Conference Paper

Auto-Encoding Adversarial Imitation Learning

  • Kaifeng Zhang
  • Rui Zhao
  • Ziming Zhang
  • Yang Gao

Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function. Adversarial Imitation Learning (AIL) sheds light on automatic policy acquisition without access to the reward signal from the environment. In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework. To induce expert policies from demonstrations, AEAIL utilizes the reconstruction error of an auto-encoder as a reward signal, which provides more information for optimizing policies than the prior discriminator-based ones. Subsequently, we use the derived objective functions to train the auto-encoder and the agent policy. Experiments show that our AEAIL performs superior compared to state-of-the-art methods on both state and image based environments. More importantly, AEAIL shows much better robustness when the expert demonstrations are noisy.

AAMAS Conference 2024 Conference Paper

Cognizing and Imitating Robotic Skills via a Dual Cognition-Action Architecture

  • Zixuan Chen
  • Ze Ji
  • Shuyang Liu
  • Jing Huo
  • Yiyu Chen
  • Yang Gao

Enabling robots to effectively learn and imitate expert skills in longhorizon tasks remains challenging. Hierarchical imitation learning (HIL) approaches have made strides but often fall short in complex scenarios due to their reliance on self-exploration. This paper introduces a novel approach inspired by the human skill acquisition process, proposing a Cognition-Action-based Robotic Skill Imitation Learning (CasIL) framework. CasIL integrates human cognitive priors for task decomposition into a dual-layer architecture, enhancing robots’ ability to cognize and imitate essential skills from expert demonstrations. Our experiments across four RLbench tasks demonstrate CasIL’s superior performance, robustness, and generalizability in skill imitation compared to related methods.

AAAI Conference 2024 Conference Paper

DGA-GNN: Dynamic Grouping Aggregation GNN for Fraud Detection

  • Mingjiang Duan
  • Tongya Zheng
  • Yang Gao
  • Gang Wang
  • Zunlei Feng
  • Xinyu Wang

Fraud detection has increasingly become a prominent research field due to the dramatically increased incidents of fraud. The complex connections involving thousands, or even millions of nodes, present challenges for fraud detection tasks. Many researchers have developed various graph-based methods to detect fraud from these intricate graphs. However, those methods neglect two distinct characteristics of the fraud graph: the non-additivity of certain attributes and the distinguishability of grouped messages from neighbor nodes. This paper introduces the Dynamic Grouping Aggregation Graph Neural Network (DGA-GNN) for fraud detection, which addresses these two characteristics by dynamically grouping attribute value ranges and neighbor nodes. In DGA-GNN, we initially propose the decision tree binning encoding to transform non-additive node attributes into bin vectors. This approach aligns well with the GNN’s aggregation operation and avoids nonsensical feature generation. Furthermore, we devise a feedback dynamic grouping strategy to classify graph nodes into two distinct groups and then employ a hierarchical aggregation. This method extracts more discriminative features for fraud detection tasks. Extensive experiments on five datasets suggest that our proposed method achieves a 3% ~ 16% improvement over existing SOTA methods. Code is available at https://github.com/AtwoodDuan/DGA-GNN.

IJCAI Conference 2024 Conference Paper

Discriminative Feature Decoupling Enhancement for Speech Forgery Detection

  • Yijun Bei
  • Xing Zhou
  • Erteng Liu
  • Yang Gao
  • Sen Lin
  • Kewei Gao
  • Zunlei Feng

The emergence of AIGC has brought attention to the issue of generating realistic deceptive content. While AIGC has the potential to revolutionize content creation, it also facilitates criminal activities. Specifically, the manipulation of speech has been exploited in tele-fraud and financial fraud schemes, posing a significant threat to societal security. Current deep learning-based methods for detecting forged speech extract mixed features from the original speech, which often contain redundant information. Moreover, these methods fail to consider the distinct characteristics of human voice-specific features and the diversity of background environmental sounds. This paper introduces a framework called Discriminative fEature dEcoupling enhanceMent (DEEM) for detecting speech forgery. Initially, the framework decouples the original speech into human voice features and background sound features. Subsequently, DEEM enhances voice-specific features through temporal dimension aggregation and improves continuity-related features in the background sound map via spectral-dimension aggregation. By employing the decoupling enhancement features, extensive experiments demonstrate that DEEM achieves an accuracy improvement of over 5% on FoR dataset compared to the state-of-the-art methods.

YNIMG Journal 2024 Journal Article

Expanding the clinical application of OPM-MEG using an effective automatic suppression method for the dental brace metal artifact

  • Ruonan Wang
  • Kaiwen Fu
  • Ruochen Zhao
  • Dawei Wang
  • Zhimin Yang
  • Wei Bin
  • Yang Gao
  • Xiaolin Ning

Optically pumped magnetometer magnetoencephalography (OPM-MEG) holds significant promise for clinical functional brain imaging due to its superior spatiotemporal resolution. However, effectively suppressing metallic artifacts, particularly from devices such as orthodontic braces and vagal nerve stimulators remains a major challenge, hindering the wider clinical application of wearable OPM-MEG devices. A comprehensive analysis of metal artifact characteristics from time, frequency, and time-frequency perspectives was conducted for the first time using an OPM-MEG device in clinical medicine. This study focused on patients with metal orthodontics, examining the modulation of metal artifacts by breath and head movement, the incomplete regular sub-Gaussian distribution, and the high absolute power ratio in the 0.5-8 Hz band. The existing metal artifact suppression algorithms applied to SQUID-MEG, such as fast independent component analysis (FastICA), information maximization (Infomax), and algorithms for multiple unknown signal extraction (AMUSE), exhibit limited efficacy. Consequently, this study introduced the second-order blind identification (SOBI) algorithm, which utilized multiple time delays for the component separation of OPM-MEG measurement signals. We modified the time delays of the SOBI method to improve its efficacy in separating artifact components, particularly those in the ultralow frequency range. This approach employs the frequency-domain absolute power ratio, root mean square (RMS) value, and mutual information methods to automate the artifact component screening process. The effectiveness of this method was validated through simulation experiments involving four subjects in both resting and evoked experiments. In addition, the proposed method was also validated by the actual OPM-MEG evoked experiments of three subjects. Comparative analyses were conducted against the FastICA, Infomax, and AMUSE algorithms. Evaluation metrics included normalized mean square error, normalized delta band power error, RMS error, and signal-to-noise ratio, demonstrating that the proposed method provides optimal suppression of metal artifacts. This advancement holds promise for enhancing data quality and expanding the clinical applications of OPM-MEG.

NeurIPS Conference 2024 Conference Paper

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

  • Xiaoyi Dong
  • Pan Zhang
  • Yuhang Zang
  • Yuhang Cao
  • Bin Wang
  • Linke Ouyang
  • Songyang Zhang
  • Haodong Duan

The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution. Recent efforts have aimed to enhance the high-resolution understanding capabilities of LVLMs, yet they remain capped at approximately 1500 $\times$ 1500 pixels and constrained to a relatively narrow resolution range. This paper represents InternLM-XComposer2-4KHD, a groundbreaking exploration into elevating LVLM resolution capabilities up to 4K HD (3840 × 1600) and beyond. Concurrently, considering the ultra-high resolution may not be necessary in all scenarios, it supports a wide range of diverse resolutions from 336 pixels to 4K standard, significantly broadening its scope of applicability. Specifically, this research advances the patch division paradigm by introducing a novel extension: dynamic resolution with automatic patch configuration. It maintains the training image aspect ratios while automatically varying patch counts and configuring layouts based on a pre-trained Vision Transformer (ViT) (336 $\times$ 336), leading to dynamic training resolution from 336 pixels to 4K standard. Our research demonstrates that scaling training resolution up to 4K HD leads to consistent performance enhancements without hitting the ceiling of potential improvements. InternLM-XComposer2-4KHD shows superb capability that matches or even surpasses GPT-4V and Gemini Pro in 10 of the 16 benchmarks.

NeurIPS Conference 2024 Conference Paper

Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks

  • Jiacong Hu
  • Jing Gao
  • Jingwen Ye
  • Yang Gao
  • Xingen Wang
  • Zunlei Feng
  • Mingli Song

With the rapid development of deep learning, the increasing complexity and scale of parameters make training a new model increasingly resource-intensive. In this paper, we start from the classic convolutional neural network (CNN) and explore a paradigm that does not require training to obtain new models. Similar to the birth of CNN inspired by receptive fields in the biological visual system, we draw inspiration from the information subsystem pathways in the biological visual system and propose Model Disassembling and Assembling (MDA). During model disassembling, we introduce the concept of relative contribution and propose a component locating technique to extract task-aware components from trained CNN classifiers. For model assembling, we present the alignment padding strategy and parameter scaling strategy to construct a new model tailored for a specific task, utilizing the disassembled task-aware components. The entire process is akin to playing with LEGO bricks, enabling arbitrary assembly of new models, and providing a novel perspective for model creation and reuse. Extensive experiments showcase that task-aware components disassembled from CNN classifiers or new models assembled using these components closely match or even surpass the performance of the baseline, demonstrating its promising results for model reuse. Furthermore, MDA exhibits diverse potential applications, with comprehensive experiments exploring model decision route analysis, model compression, knowledge distillation, and more.

IROS Conference 2024 Conference Paper

MQE: Unleashing the Power of Interaction with Multi-agent Quadruped Environment

  • Ziyan Xiong
  • Bo Chen
  • Shiyu Huang 0001
  • Wei-Wei Tu
  • Zhaofeng He 0001
  • Yang Gao

The advent of deep reinforcement learning (DRL) has significantly advanced the field of robotics, particularly in the control and coordination of quadruped robots. However, the complexity of real-world tasks often necessitates the deployment of multi-robot systems capable of sophisticated interaction and collaboration. To address this need, we introduce the Multi-agent Quadruped Environment (MQE), a novel platform designed to facilitate the development and evaluation of multi-agent reinforcement learning (MARL) algorithms in realistic and dynamic scenarios. MQE emphasizes complex interactions between robots and objects, hierarchical policy structures, and challenging evaluation scenarios that reflect real-world applications. We present a series of collaborative and competitive tasks within MQE, ranging from simple coordination to complex adversarial interactions, and benchmark state-of-the-art MARL algorithms. Our findings indicate that hierarchical reinforcement learning can simplify task learning, but also highlight the need for advanced algorithms capable of handling the intricate dynamics of multi-agent interactions. MQE serves as a stepping stone towards bridging the gap between simulation and practical deployment, offering a rich environment for future research in multi-agent systems and robot learning. For open-sourced code and more details of MQE, please refer to https://ziyanx02.github.io/multiagent-quadruped-environment/.

AAAI Conference 2024 Conference Paper

Optimistic Value Instructors for Cooperative Multi-Agent Reinforcement Learning

  • Chao Li
  • Yupeng Zhang
  • Jianqi Wang
  • Yujing Hu
  • Shaokang Dong
  • Wenbin Li
  • Tangjie Lv
  • Changjie Fan

In cooperative multi-agent reinforcement learning, decentralized agents hold the promise of overcoming the combinatorial explosion of joint action space and enabling greater scalability. However, they are susceptible to a game-theoretic pathology called relative overgeneralization that shadows the optimal joint action. Although recent value-decomposition algorithms guide decentralized agents by learning a factored global action value function, the representational limitation and the inaccurate sampling of optimal joint actions during the learning process make this problem still. To address this limitation, this paper proposes a novel algorithm called Optimistic Value Instructors (OVI). The main idea behind OVI is to introduce multiple optimistic instructors into the value-decomposition paradigm, which are capable of suggesting potentially optimal joint actions and rectifying the factored global action value function to recover these optimal actions. Specifically, the instructors maintain optimistic value estimations of per-agent local actions and thus eliminate the negative effects caused by other agents' exploratory or sub-optimal non-cooperation, enabling accurate identification and suggestion of optimal joint actions. Based on the instructors' suggestions, the paper further presents two instructive constraints to rectify the factored global action value function to recover these optimal joint actions, thus overcoming the RO problem. Experimental evaluation of OVI on various cooperative multi-agent tasks demonstrates its superior performance against multiple baselines, highlighting its effectiveness.

AAAI Conference 2024 Conference Paper

PG-LBO: Enhancing High-Dimensional Bayesian Optimization with Pseudo-Label and Gaussian Process Guidance

  • Taicai Chen
  • Yue Duan
  • Dong Li
  • Lei Qi
  • Yinghuan Shi
  • Yang Gao

Variational Autoencoder based Bayesian Optimization (VAE-BO) has demonstrated its excellent performance in addressing high-dimensional structured optimization problems. However, current mainstream methods overlook the potential of utilizing a pool of unlabeled data to construct the latent space, while only concentrating on designing sophisticated models to leverage the labeled data. Despite their effective usage of labeled data, these methods often require extra network structures, additional procedure, resulting in computational inefficiency. To address this issue, we propose a novel method to effectively utilize unlabeled data with the guidance of labeled data. Specifically, we tailor the pseudo-labeling technique from semi-supervised learning to explicitly reveal the relative magnitudes of optimization objective values hidden within the unlabeled data. Based on this technique, we assign appropriate training weights to unlabeled data to enhance the construction of a discriminative latent space. Furthermore, we treat the VAE encoder and the Gaussian Process (GP) in Bayesian optimization as a unified deep kernel learning process, allowing the direct utilization of labeled data, which we term as Gaussian Process guidance. This directly and effectively integrates the goal of improving GP accuracy into the VAE training, thereby guiding the construction of the latent space. The extensive experiments demonstrate that our proposed method outperforms existing VAE-BO algorithms in various optimization scenarios. Our code will be published at https://github.com/TaicaiChen/PG-LBO.

YNIMG Journal 2024 Journal Article

Quantitative susceptibility mapping through model-based deep image prior (MoDIP)

  • Zhuang Xiong
  • Yang Gao
  • Yin Liu
  • Amir Fazlollahi
  • Peter Nestor
  • Feng Liu
  • Hongfu Sun

The data-driven approach of supervised learning methods has limited applicability in solving dipole inversion in Quantitative Susceptibility Mapping (QSM) with varying scan parameters across different objects. To address this generalization issue in supervised QSM methods, we propose a novel training-free model-based unsupervised method called MoDIP (Model-based Deep Image Prior). MoDIP comprises a small, untrained network and a Data Fidelity Optimization (DFO) module. The network converges to an interim state, acting as an implicit prior for image regularization, while the optimization process enforces the physical model of QSM dipole inversion. Experimental results demonstrate MoDIP's excellent generalizability in solving QSM dipole inversion across different scan parameters. It exhibits robustness against pathological brain QSM, achieving over 32 % accuracy improvement than supervised deep learning methods. It is also 33 % more computationally efficient and runs 4 times faster than conventional DIP-based approaches, enabling 3D high-resolution image reconstruction in under 4.5 min.

AAAI Conference 2024 Conference Paper

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

  • Ruiqian Nai
  • Zixin Wen
  • Ji Li
  • Yuanzhi Li
  • Yang Gao

In representation learning, a disentangled representation is highly desirable as it encodes generative factors of data in a separable and compact pattern. Researchers have advocated leveraging disentangled representations to complete downstream tasks with encouraging empirical evidence. This paper further investigates the necessity of disentangled representation in downstream applications. Specifically, we show that dimension-wise disentangled representations are unnecessary on a fundamental downstream task, abstract visual reasoning. We provide extensive empirical evidence against the necessity of disentanglement, covering multiple datasets, representation learning methods, and downstream network architectures. Furthermore, our findings suggest that the informativeness of representations is a better indicator of downstream performance than disentanglement. Finally, the positive correlation between informativeness and disentanglement explains the claimed usefulness of disentangled representations in previous works. The source code is available at https://github.com/Richard-coder-Nai/disentanglement-lib-necessity.git

NeurIPS Conference 2024 Conference Paper

SCaR: Refining Skill Chaining for Long-Horizon Robotic Manipulation via Dual Regularization

  • Zixuan Chen
  • Ze Ji
  • Jing Huo
  • Yang Gao

Long-horizon robotic manipulation tasks typically involve a series of interrelated sub-tasks spanning multiple execution stages. Skill chaining offers a feasible solution for these tasks by pre-training the skills for each sub-task and linking them sequentially. However, imperfections in skill learning or disturbances during execution can lead to the accumulation of errors in skill chaining process, resulting in execution failures. In this paper, we investigate how to achieve stable and smooth skill chaining for long-horizon robotic manipulation tasks. Specifically, we propose a novel skill chaining framework called Skill Chaining via Dual Regularization (SCaR). This framework applies dual regularization to sub-task skill pre-training and fine-tuning, which not only enhances the intra-skill dependencies within each sub-task skill but also reinforces the inter-skill dependencies between sequential sub-task skills, thus ensuring smooth skill chaining and stable long-horizon execution. We evaluate the SCaR framework on two representative long-horizon robotic manipulation simulation benchmarks: IKEA furniture assembly and kitchen organization. Additionally, we conduct a simple real-world validation in tabletop robot pick-and-place tasks. The experimental results show that, with the support of SCaR, the robot achieves a higher success rate in long-horizon tasks compared to relevant baselines and demonstrates greater robustness to perturbations.

IJCAI Conference 2024 Conference Paper

STAR: Spatio-Temporal State Compression for Multi-Agent Tasks with Rich Observations

  • Chao Li
  • Yujing Hu
  • Shangdong Yang
  • Tangjie Lv
  • Changjie Fan
  • Wenbin Li
  • Chongjie Zhang
  • Yang Gao

This paper focuses on the problem of learning compressed state representations for multi-agent tasks. Under the assumption of rich observation, we pinpoint that the state representations should be compressed both spatially and temporally to enable efficient prioritization of task-relevant features, while existing works typically fail. To overcome this limitation, we propose a novel method named Spatio-Temporal stAte compRession (STAR) that explicitly defines both spatial and temporal compression operations on the learned state representations to encode per-agent task-relevant features. Specifically, we first formalize this problem by introducing Task Informed Partially Observable Stochastic Game (TI-POSG). Then, we identify the spatial representation compression in it as encoding the latent states from the joint observations of all agents, and achieve this by learning representations that approximate the latent states based on the information theoretical principle. After that, we further extract the task-relevant features of each agent from these representations by aligning them based on their reward similarities, which is regarded as the temporal representation compression. Structurally, we implement these two compression by learning a set of agent-specific decoding functions and incorporate them into a critic shared by agents for scalable learning. We evaluate our method by developing decentralized policies on 12 maps of the StarCraft Multi-Agent Challenge benchmark, and the superior performance demonstrates its effectiveness.

NeurIPS Conference 2024 Conference Paper

START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation

  • Jintao Guo
  • Lei Qi
  • Yinghuan Shi
  • Yang Gao

Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains. Existing DG methods primarily rely on convolutional neural networks (CNNs), which inherently learn texture biases due to their limited receptive fields, making them prone to overfitting source domains. While some works have introduced transformer-based methods (ViTs) for DG to leverage the global receptive field, these methods incur high computational costs due to the quadratic complexity of self-attention. Recently, advanced state space models (SSMs), represented by Mamba, have shown promising results in supervised learning tasks by achieving linear complexity in sequence length during training and fast RNN-like computation during inference. Inspired by this, we investigate the generalization ability of the Mamba model under domain shifts and find that input-dependent matrices within SSMs could accumulate and amplify domain-specific features, thus hindering model generalization. To address this issue, we propose a novel SSM-based architecture with saliency-based token-aware transformation (namely START), which achieves state-of-the-art (SOTA) performances and offers a competitive alternative to CNNs and ViTs. Our START can selectively perturb and suppress domain-specific features in salient tokens within the input-dependent matrices of SSMs, thus effectively reducing the discrepancy between different domains. Extensive experiments on five benchmarks demonstrate that START outperforms existing SOTA DG methods with efficient linear complexity. Our code is available at https: //github. com/lingeringlight/START.

JMLR Journal 2024 Journal Article

Towards Explainable Evaluation Metrics for Machine Translation

  • Christoph Leiter
  • Piyawat Lertvittayakumjorn
  • Marina Fomicheva
  • Wei Zhao
  • Yang Gao
  • Steffen Eger

Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for machine translation (for example, COMET or BERTScore) are based on black-box large language models. They often achieve strong correlations with human judgments, but recent research indicates that the lower-quality classical metrics remain dominant, one of the potential reasons being that their decision processes are more transparent. To foster more widespread acceptance of novel high-quality metrics, explainability thus becomes crucial. In this concept paper, we identify key properties as well as key goals of explainable machine translation metrics and provide a comprehensive synthesis of recent techniques, relating them to our established goals and properties. In this context, we also discuss the latest state-of-the-art approaches to explainable metrics based on generative models such as ChatGPT and GPT4. Finally, we contribute a vision of next-generation approaches, including natural language explanations. We hope that our work can help catalyze and guide future research on explainable evaluation metrics and, mediately, also contribute to better and more transparent machine translation systems. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

Transformer Doctor: Diagnosing and Treating Vision Transformers

  • Jiacong Hu
  • Hao Chen
  • Kejia Chen
  • Yang Gao
  • Jingwen Ye
  • Xingen Wang
  • Mingli Song
  • Zunlei Feng

Due to its powerful representational capabilities, Transformers have gradually become the mainstream model in the field of machine vision. However, the vast and complex parameters of Transformers impede researchers from gaining a deep understanding of their internal mechanisms, especially error mechanisms. Existing methods for interpreting Transformers mainly focus on understanding them from the perspectives of the importance of input tokens or internal modules, as well as the formation and meaning of features. In contrast, inspired by research on information integration mechanisms and conjunctive errors in the biological visual system, this paper conducts an in-depth exploration of the internal error mechanisms of Transformers. We first propose an information integration hypothesis for Transformers in the machine vision domain and provide substantial experimental evidence to support this hypothesis. This includes the dynamic integration of information among tokens and the static integration of information within tokens in Transformers, as well as the presence of conjunctive errors therein. Addressing these errors, we further propose heuristic dynamic integration constraint methods and rule-based static integration constraint methods to rectify errors and ultimately improve model performance. The entire methodology framework is termed as Transformer Doctor, designed for diagnosing and treating internal errors within transformers. Through a plethora of quantitative and qualitative experiments, it has been demonstrated that Transformer Doctor can effectively address internal errors in transformers, thereby enhancing model performance.

AAAI Conference 2024 Conference Paper

ViT-Calibrator: Decision Stream Calibration for Vision Transformer

  • Lin Chen
  • Zhijie Jia
  • Lechao Cheng
  • Yang Gao
  • Jie Lei
  • Yijun Bei
  • Zunlei Feng

A surge of interest has emerged in utilizing Transformers in diverse vision tasks owing to its formidable performance. However, existing approaches primarily focus on optimizing internal model architecture designs that often entail significant trial and error with high burdens. In this work, we propose a new paradigm dubbed Decision Stream Calibration that boosts the performance of general Vision Transformers. To achieve this, we shed light on the information propagation mechanism in the learning procedure by exploring the correlation between different tokens and the relevance coefficient of multiple dimensions. Upon further analysis, it was discovered that 1) the final decision is associated with tokens of foreground targets, while token features of foreground target will be transmitted into the next layer as much as possible, and the useless token features of background area will be eliminated gradually in the forward propagation. 2) Each category is solely associated with specific sparse dimensions in the tokens. Based on the discoveries mentioned above, we designed a two-stage calibration scheme, namely ViT-Calibrator, including token propagation calibration stage and dimension propagation calibration stage. Extensive experiments on commonly used datasets show that the proposed approach can achieve promising results.

AAAI Conference 2024 Conference Paper

Weakly Supervised Multimodal Affordance Grounding for Egocentric Images

  • Lingjing Xu
  • Yang Gao
  • Wenfeng Song
  • Aimin Hao

To enhance the interaction between intelligent systems and the environment, locating the affordance regions of objects is crucial. These regions correspond to specific areas that provide distinct functionalities. Humans often acquire the ability to identify these regions through action demonstrations and verbal instructions. In this paper, we present a novel multimodal framework that extracts affordance knowledge from exocentric images, which depict human-object interactions, as well as from accompanying textual descriptions that describe the performed actions. The extracted knowledge is then transferred to egocentric images. To achieve this goal, we propose the HOI-Transfer Module, which utilizes local perception to disentangle individual actions within exocentric images. This module effectively captures localized features and correlations between actions, leading to valuable affordance knowledge. Additionally, we introduce the Pixel-Text Fusion Module, which fuses affordance knowledge by identifying regions in egocentric images that bear resemblances to the textual features defining affordances. We employ a Weakly Supervised Multimodal Affordance (WSMA) learning approach, utilizing image-level labels for training. Through extensive experiments, we demonstrate the superiority of our proposed method in terms of evaluation metrics and visual results when compared to existing affordance grounding models. Furthermore, ablation experiments confirm the effectiveness of our approach. Code:https://github.com/xulingjing88/WSMA.

YNIMG Journal 2023 Journal Article

Affine transformation edited and refined deep neural network for quantitative susceptibility mapping

  • Zhuang Xiong
  • Yang Gao
  • Feng Liu
  • Hongfu Sun

Deep neural networks have demonstrated great potential in solving dipole inversion for Quantitative Susceptibility Mapping (QSM). However, the performances of most existing deep learning methods drastically degrade with mismatched sequence parameters such as acquisition orientation and spatial resolution. We propose an end-to-end AFfine Transformation Edited and Refined (AFTER) deep neural network for QSM, which is robust against arbitrary acquisition orientation and spatial resolution up to 0.6 mm isotropic at the finest. The AFTER-QSM neural network starts with a forward affine transformation layer, followed by a Unet for dipole inversion, then an inverse affine transformation layer, followed by a Residual Dense Network (RDN) for QSM refinement. Simulation and in-vivo experiments demonstrated that the proposed AFTER-QSM network architecture had excellent generalizability. It can successfully reconstruct susceptibility maps from highly oblique and anisotropic scans, leading to the best image quality assessments in simulation tests and suppressed streaking artifacts and noise levels for in-vivo experiments compared with other methods. Furthermore, ablation studies showed that the RDN refinement network significantly reduced image blurring and susceptibility underestimation due to affine transformations. In addition, the AFTER-QSM network substantially shortened the reconstruction time from minutes using conventional methods to only a few seconds.

AAAI Conference 2023 Conference Paper

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games

  • Linjian Meng
  • Zhenxing Ge
  • Pinzhuo Tian
  • Bo An
  • Yang Gao

One of the most popular methods for learning Nash equilibrium (NE) in large-scale imperfect information extensive-form games (IIEFGs) is the neural variants of counterfactual regret minimization (CFR). CFR is a special case of Follow-The-Regularized-Leader (FTRL). At each iteration, the neural variants of CFR update the agent's strategy via the estimated counterfactual regrets. Then, they use neural networks to approximate the new strategy, which incurs an approximation error. These approximation errors will accumulate since the counterfactual regrets at iteration t are estimated using the agent's past approximated strategies. Such accumulated approximation error causes poor performance. To address this accumulated approximation error, we propose a novel FTRL algorithm called FTRL-ORW, which does not utilize the agent's past strategies to pick the next iteration strategy. More importantly, FTRL-ORW can update its strategy via the trajectories sampled from the game, which is suitable to solve large-scale IIEFGs since sampling multiple actions for each information set is too expensive in such games. However, it remains unclear which algorithm to use to compute the next iteration strategy for FTRL-ORW when only such sampled trajectories are revealed at iteration t. To address this problem and scale FTRL-ORW to large-scale games, we provide a model-free method called Deep FTRL-ORW, which computes the next iteration strategy using model-free Maximum Entropy Deep Reinforcement Learning. Experimental results on two-player zero-sum IIEFGs show that Deep FTRL-ORW significantly outperforms existing model-free neural methods and OS-MCCFR.

JBHI Journal 2023 Journal Article

An End-to-End Energy-Efficient Approach for Intake Detection With Low Inference Time Using Wrist-Worn Sensor

  • Boyang Wei
  • Shibo Zhang
  • Xingjian Diao
  • Qiuyang Xu
  • Yang Gao
  • Nabil Alshurafa

Automated detection of intake gestures with wearable sensors has been a critical area of research for advancing our understanding and ability to intervene in people's eating behavior. Numerous algorithms have been developed and evaluated in terms of accuracy. However, ensuring the system is not only accurate in making predictions but also efficient in doing so is critical for real-world deployment. Despite the growing research on accurate detection of intake gestures using wearables, many of these algorithms are often energy inefficient, impeding on-device deployment for continuous and real-time monitoring of diet. This article presents a template-based optimized multicenter classifier that enables accurate intake gesture detection while maintaining low-inference time and energy consumption using a wrist-worn accelerometer and gyroscope. We designed an Intake Gesture Counter smartphone application (CountING) and validated the practicality of our algorithm against seven state-of-the-art approaches on three public datasets (In-lab FIC, Clemson, and OREBA). Compared with other methods, we achieved optimal accuracy (81. 60% F1 score) and very low inference time (15. 97 msec per 2. 20-sec data sample) on the Clemson dataset, and among the top performing algorithms, we achieve comparable accuracy (83. 0% F1 score compared with 85. 6% in the top performing algorithm) but superior inference time (13. 8x faster, 33. 14 msec per 2. 20-sec data sample) on the In-lab FIC dataset and comparable accuracy (83. 40% F1 score compared with 88. 10% in the top-performing algorithm) but superior inference time (33. 9x faster, 16. 71 msec inference time per 2. 20-sec data sample) on the OREBA dataset. On average, our approach achieved a 25-hour battery lifetime (44% to 52% improvement over state-of-the-art approaches) when tested on a commercial smartwatch for continuous real-time detection. Our approach demonstrates an effective and efficient method, enabling real-time intake gesture detection using wrist-worn devices in longitudinal studies.

JAAMAS Journal 2023 Journal Article

ASN: action semantics network for multiagent reinforcement learning

  • Tianpei Yang
  • Weixun Wang
  • Yang Gao

Abstract In multiagent systems (MASs), each agent makes individual decisions but all contribute globally to the system’s evolution. Learning in MASs is difficult since each agent’s selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the number of agents. Previous works borrow various multiagent coordination mechanisms for use in deep learning architectures to facilitate multiagent coordination. However, none of them explicitly consider that different actions can have different influence on other agents, which we call the action semantics. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions’ influence on other agents using neural networks based on the action semantics between them. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II micromanagement and Neural MMO show that ASN significantly improves the performance of state-of-the-art DRL approaches, compared with several other network architectures. We also successfully deploy ASN to a popular online MMORPG game called Justice Online, which indicates a promising future for ASN to be applied in even more complex scenarios.

YNIMG Journal 2023 Journal Article

Design and application of a multimodality-compatible 1Tx/6Rx RF coil for monkey brain MRI at 7T

  • Shuxian Qu
  • Sunhang Shi
  • Zhiyan Quan
  • Yang Gao
  • Minmin Wang
  • Yueming Wang
  • Gang Pan
  • Hsin-Yi Lai

OBJECTIVE: Blood-oxygen-level-dependent functional MRI allows to investigte neural activities and connectivity. While the non-human primate plays an essential role in neuroscience research, multimodal methods combining functional MRI with other neuroimaging and neuromodulation enable us to understand the brain network at multiple scales. APPROACH: In this study, a tight-fitting helmet-shape receive array with a single transmit loop for anesthetized macaque brain MRI at 7T was fabricated with four openings constructed in the coil housing to accommodate multimodal devices, and the coil performance was quantitatively evaluated and compared to a commercial knee coil. In addition, experiments over three macaques with infrared neural stimulation (INS), focused ultrasound stimulation (FUS), and transcranial direct current stimulation (tDCS) were conducted. MAIN RESULTS: The RF coil showed higher transmit efficiency, comparable homogeneity, improved SNR and enlarged signal coverage over the macaque brain. Infrared neural stimulation was applied to the amygdala in deep brain region, and activations in stimulation sites and connected sites were detected, with the connectivity consistent with anatomical information. Focused ultrasound stimulation was applied to the left visual cortex, and activations were acquired along the ultrasound traveling path, with all time course curves consistent with pre-designed paradigms. The existence of transcranial direct current stimulation electrodes brought no interference to the RF system, as evidenced through high-resolution MPRAGE structure images. SIGNIFICANCE: This pilot study reveals the feasibility for brain investigation at multiple spatiotemporal scales, which may advance our understanding in dynamic brain networks.

IS Journal 2023 Journal Article

Effective Interpretable Policy Distillation via Critical Experience Point Identification

  • Xiao Liu
  • Shuyang Liu
  • Bo An
  • Yang Gao
  • Shangdong Yang
  • Wenbin Li

Interpretable policy distillation aims to imitate a deep reinforcement learning (DRL) policy into a self-explainable model. However, the distilled policy usually does not generalize well to complex tasks. To investigate this phenomenon, we examine the experience pools of DRL tasks and find that these interactive experience distributions are heavy tailed. However, this critical issue is largely ignored by existing approaches, and, thus, they do not fully unitize the less frequent but very critical experience points. To address this issue, we propose characterizing decision boundaries via the minimum experience retention to deal with the heavy-tailed experience distributions. Our method identifies critical experience points that are close to the model’s decision boundaries, and such experience points are more critical because they portray the prerequisite of a model to take an action. As a result, our method distills the DRL policy to a self-explainable structure without a neural structure and ambiguous intermediate parameters. Through experiments on six games, we show that our method outperforms the state-of-the-art baselines in cumulative rewards, stability, and faithfulness.

NeurIPS Conference 2023 Conference Paper

Efficient Subgame Refinement for Extensive-form Games

  • Zhenxing Ge
  • Zheng Xu
  • Tianyu Ding
  • Wenbin Li
  • Yang Gao

Subgame solving is an essential technique in addressing large imperfect information games, with various approaches developed to enhance the performance of refined strategies in the abstraction of the target subgame. However, directly applying existing subgame solving techniques may be difficult, due to the intricate nature and substantial size of many real-world games. To overcome this issue, recent subgame solving methods allow for subgame solving on limited knowledge order subgames, increasing their applicability in large games; yet this may still face obstacles due to extensive information set sizes. To address this challenge, we propose a generative subgame solving (GS2) framework, which utilizes a generation function to identify a subset of the earliest-reached nodes, reducing the size of the subgame. Our method is supported by a theoretical analysis and employs a diversity-based generation function to enhance safety. Experiments conducted on medium-sized games as well as the challenging large game of GuanDan demonstrate a significant improvement over the blueprint.

AAAI Conference 2023 Conference Paper

Enhanced Tensor Low-Rank and Sparse Representation Recovery for Incomplete Multi-View Clustering

  • Chao Zhang
  • Huaxiong Li
  • Wei Lv
  • Zizheng Huang
  • Yang Gao
  • Chunlin Chen

Incomplete multi-view clustering (IMVC) has attracted remarkable attention due to the emergence of multi-view data with missing views in real applications. Recent methods attempt to recover the missing information to address the IMVC problem. However, they generally cannot fully explore the underlying properties and correlations of data similarities across views. This paper proposes a novel Enhanced Tensor Low-rank and Sparse Representation Recovery (ETLSRR) method, which reformulates the IMVC problem as a joint incomplete similarity graphs learning and complete tensor representation recovery problem. Specifically, ETLSRR learns the intra-view similarity graphs and constructs a 3-way tensor by stacking the graphs to explore the inter-view correlations. To alleviate the negative influence of missing views and data noise, ETLSRR decomposes the tensor into two parts: a sparse tensor and an intrinsic tensor, which models the noise and underlying true data similarities, respectively. Both global low-rank and local structured sparse characteristics of the intrinsic tensor are considered, which enhances the discrimination of similarity matrix. Moreover, instead of using the convex tensor nuclear norm, ETLSRR introduces a generalized non-convex tensor low-rank regularization to alleviate the biased approximation. Experiments on several datasets demonstrate the effectiveness of our method compared with the state-of-the-art methods.

AAAI Conference 2023 Conference Paper

Entity-Agnostic Representation Learning for Parameter-Efficient Knowledge Graph Embedding

  • Mingyang Chen
  • Wen Zhang
  • Zhen Yao
  • Yushan Zhu
  • Yang Gao
  • Jeff Z. Pan
  • Huajun Chen

We propose an entity-agnostic representation learning method for handling the problem of inefficient parameter storage costs brought by embedding knowledge graphs. Conventional knowledge graph embedding methods map elements in a knowledge graph, including entities and relations, into continuous vector spaces by assigning them one or multiple specific embeddings (i.e., vector representations). Thus the number of embedding parameters increases linearly as the growth of knowledge graphs. In our proposed model, Entity-Agnostic Representation Learning (EARL), we only learn the embeddings for a small set of entities and refer to them as reserved entities. To obtain the embeddings for the full set of entities, we encode their distinguishable information from their connected relations, k-nearest reserved entities, and multi-hop neighbors. We learn universal and entity-agnostic encoders for transforming distinguishable information into entity embeddings. This approach allows our proposed EARL to have a static, efficient, and lower parameter count than conventional knowledge graph embedding methods. Experimental results show that EARL uses fewer parameters and performs better on link prediction tasks than baselines, reflecting its parameter efficiency.

AAAI Conference 2023 Conference Paper

Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient

  • Wubing Chen
  • Wenbin Li
  • Xiao Liu
  • Shangdong Yang
  • Yang Gao

Cooperative multi-agent policy gradient (MAPG) algorithms have recently attracted wide attention and are regarded as a general scheme for the multi-agent system. Credit assignment plays an important role in MAPG and can induce cooperation among multiple agents. However, most MAPG algorithms cannot achieve good credit assignment because of the game-theoretic pathology known as centralized-decentralized mismatch. To address this issue, this paper presents a novel method, Multi-Agent Polarization Policy Gradient (MAPPG). MAPPG takes a simple but efficient polarization function to transform the optimal consistency of joint and individual actions into easily realized constraints, thus enabling efficient credit assignment in MAPPG. Theoretically, we prove that individual policies of MAPPG can converge to the global optimum. Empirically, we evaluate MAPPG on the well-known matrix game and differential game, and verify that MAPPG can converge to the global optimum for both discrete and continuous action spaces. We also evaluate MAPPG on a set of StarCraft II micromanagement tasks and demonstrate that MAPPG outperforms the state-of-the-art MAPG algorithms.

AAAI Conference 2023 Conference Paper

Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination

  • Rui Zhao
  • Jinming Song
  • Yufeng Yuan
  • Haifeng Hu
  • Yang Gao
  • Yi Wu
  • Zhongqian Sun
  • Wei Yang

We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using human data. Although such agents can be obtained through self-play training, they can suffer significantly from the distributional shift when paired with unencountered partners, such as humans. In this paper, we propose Maximum Entropy Population-based training (MEP) to mitigate such distributional shift. In MEP, agents in the population are trained with our derived Population Entropy bonus to promote the pairwise diversity between agents and the individual diversity of agents themselves. After obtaining this diversified population, a common best agent is trained by paring with agents in this population via prioritized sampling, where the prioritization is dynamically adjusted based on the training progress. We demonstrate the effectiveness of our method MEP, with comparison to Self-Play PPO (SP), Population-Based Training (PBT), Trajectory Diversity (TrajeDi), and Fictitious Co-Play (FCP) in both matrix game and Overcooked game environments, with partners being human proxy models and real humans. A supplementary video showing experimental results is available at https://youtu.be/Xh-FKD0AAKE.

AAMAS Conference 2023 Conference Paper

TiLD: Third-person Imitation Learning by Estimating Domain Cognitive Differences of Visual Demonstrations

  • Zixuan Chen
  • Wenbin Li
  • Yang Gao
  • Yiyu Chen

To enable agents to effectively imitate from the third-person visual demonstrations in complex imitation learning (IL) tasks, in this paper, we propose a new IL method, which is named third-person imitation learning by estimating domain cognitive differences (TiLD). The proposed TiLD is able to eliminate the domain cognitive difference between the samples from different perspectives, so as to achieve the purpose of allowing agent to directly learn from the third-person demonstrations. Experimental results indicate that TiLD can achieve significant performance improvements over the existing state-of-the-art IL methods, when dealing with imitation learning tasks with third-person expert demonstrations.

TMLR Journal 2023 Journal Article

Trip-ROMA: Self-Supervised Learning with Triplets and Random Mappings

  • Wenbin Li
  • Xuesong Yang
  • Meihao Kong
  • Lei Wang
  • Jing Huo
  • Yang Gao
  • Jiebo Luo

Contrastive self-supervised learning (SSL) methods, such as MoCo and SimCLR, have achieved great success in unsupervised visual representation learning. They rely on a large number of negative pairs and thus require either large memory banks or large batches. Some recent non-contrastive SSL methods, such as BYOL and SimSiam, attempt to discard negative pairs and have also shown remarkable performance. To avoid collapsed solutions caused by not using negative pairs, these methods require non-trivial asymmetry designs. However, in small data regimes, we can not obtain a sufficient number of negative pairs or effectively avoid the over-fitting problem when negatives are not used at all. To address this situation, we argue that negative pairs are still important but one is generally sufficient for each positive pair. We show that a simple Triplet-based loss (Trip) can achieve surprisingly good performance without requiring large batches or asymmetry designs. Moreover, to alleviate the over-fitting problem in small data regimes and further enhance the effect of Trip, we propose a simple plug-and-play RandOm MApping (ROMA) strategy by randomly mapping samples into other spaces and requiring these randomly projected samples to satisfy the same relationship indicated by the triplets. Integrating the triplet-based loss with random mapping, we obtain the proposed method Trip-ROMA. Extensive experiments, including unsupervised representation learning and unsupervised few-shot learning, have been conducted on ImageNet-1K and seven small datasets. They successfully demonstrate the effectiveness of Trip-ROMA and consistently show that ROMA can further effectively boost other SSL methods. Code is available at https://github.com/WenbinLee/Trip-ROMA.

NeurIPS Conference 2022 Conference Paper

An Empirical Study on Disentanglement of Negative-free Contrastive Learning

  • Jinkun Cao
  • Ruiqian Nai
  • Qing Yang
  • Jialei Huang
  • Yang Gao

Negative-free contrastive learning methods have attracted a lot of attention with simplicity and impressive performances for large-scale pretraining. However, its disentanglement property remains unexplored. In this paper, we examine negative-free contrastive learning methods to study the disentanglement property empirically. We find that existing disentanglement metrics fail to make meaningful measurements for high-dimensional representation models, so we propose a new disentanglement metric based on Mutual Information between latent representations and data factors. With this proposed metric, we benchmark the disentanglement property of negative-free contrastive learning on both popular synthetic datasets and a real-world dataset CelebA. Our study shows that the investigated methods can learn a well-disentangled subset of representation. As far as we know, we are the first to extend the study of disentangled representation learning to high-dimensional representation space and introduce negative-free contrastive learning methods into this area. The source code of this paper is available at https: //github. com/noahcao/disentanglement lib med.

YNIMG Journal 2022 Journal Article

Instant tissue field and magnetic susceptibility mapping from MRI raw phase using Laplacian enhanced deep neural networks

  • Yang Gao
  • Zhuang Xiong
  • Amir Fazlollahi
  • Peter J Nestor
  • Viktor Vegh
  • Fatima Nasrallah
  • Craig Winter
  • G. Bruce Pike

Quantitative susceptibility mapping (QSM) is an MRI post-processing technique that produces spatially resolved magnetic susceptibility maps from phase data. However, the traditional QSM reconstruction pipeline involves multiple non-trivial steps, including phase unwrapping, background field removal, and dipole inversion. These intermediate steps not only increase the reconstruction time but accumulates errors. This study aims to overcome existing limitations by developing a Laplacian-of-Trigonometric-functions (LoT) enhanced deep neural network for near-instant quantitative field and susceptibility mapping (i.e., iQFM and iQSM) from raw MRI phase data. The proposed iQFM and iQSM methods were compared with established reconstruction pipelines on simulated and in vivo datasets. In addition, experiments on patients with intracranial hemorrhage and multiple sclerosis were also performed to test the generalization of the proposed neural networks. The proposed iQFM and iQSM methods in healthy subjects yielded comparable results to those involving the intermediate steps while dramatically improving reconstruction accuracies on intracranial hemorrhages with large susceptibilities. High susceptibility contrast between multiple sclerosis lesions and healthy tissue was also achieved using the proposed methods. Comparative studies indicated that the most significant contributor to iQFM and iQSM over conventional multi-step methods was the elimination of traditional Laplacian unwrapping. The reconstruction time on the order of minutes for traditional approaches was shortened to around 0.1 s using the trained iQFM and iQSM neural networks.

AAAI Conference 2022 Conference Paper

LaSSL: Label-Guided Self-Training for Semi-supervised Learning

  • Zhen Zhao
  • Luping Zhou
  • Lei Wang
  • Yinghuan Shi
  • Yang Gao

The key to semi-supervised learning (SSL) is to explore adequate information to leverage the unlabeled data. Current dominant approaches aim to generate pseudolabels on weakly augmented instances and train models on their corresponding strongly augmented variants with high-confidence results. However, such methods are limited in excluding samples with low-confidence pseudo-labels and under-utilization of the label information. In this paper, we emphasize the cruciality of the label information and propose a Label-guided Self-training approach to Semi-supervised Learning (LaSSL), which improves pseudo-label generations from two mutually boosted strategies. First, with the ground-truth labels and iteratively-polished pseudolabels, we explore instance relations among all samples and then minimize a class-aware contrastive loss to learn discriminative feature representations that make same-class samples gathered and different-class samples scattered. Second, on top of improved feature representations, we propagate the label information to the unlabeled samples across the potential data manifold at the feature-embedding level, which can further improve the labelling of samples with reference to their neighbours. These two strategies are seamlessly integrated and mutually promoted across the whole training process. We evaluate LaSSL on several classification benchmarks under partially labeled settings and demonstrate its superiority over the state-of-the-art approaches.

NeurIPS Conference 2022 Conference Paper

Planning for Sample Efficient Imitation Learning

  • Zhao-Heng Yin
  • Weirui Ye
  • Qifeng Chen
  • Yang Gao

Imitation learning is a class of promising policy learning algorithms that is free from many practical issues with reinforcement learning, such as the reward design issue and the exploration hardness. However, the current imitation algorithm struggles to achieve both high performance and high in-environment sample efficiency simultaneously. Behavioral Cloning (BC) does not need in-environment interactions, but it suffers from the covariate shift problem which harms its performance. Adversarial Imitation Learning (AIL) turns imitation learning into a distribution matching problem. It can achieve better performance on some tasks but it requires a large number of in-environment interactions. Inspired by the recent success of EfficientZero in RL, we propose EfficientImitate (EI), a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously. Our algorithmic contribution in this paper is two-fold. First, we extend AIL into the MCTS-based RL. Second, we show the seemingly incompatible two classes of imitation algorithms (BC and AIL) can be naturally unified under our framework, enjoying the benefits of both. We benchmark our method not only on the state-based DeepMind Control Suite but also on the image version which many previous works find highly challenging. Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency. EI shows over 4x gain in performance in the limited sample setting on state-based and image-based tasks and can solve challenging problems like Humanoid, where previous methods fail with a small amount of interactions. Our code is available at https: //github. com/zhaohengyin/EfficientImitate.

NeurIPS Conference 2022 Conference Paper

Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning

  • Zhecheng Yuan
  • Zhengrong Xue
  • Bo Yuan
  • Xueqian Wang
  • Yi Wu
  • Yang Gao
  • Huazhe Xu

Learning generalizable policies that can adapt to unseen environments remains challenging in visual Reinforcement Learning (RL). Existing approaches try to acquire a robust representation via diversifying the appearances of in-domain observations for better generalization. Limited by the specific observations of the environment, these methods ignore the possibility of exploring diverse real-world image datasets. In this paper, we investigate how a visual RL agent would benefit from the off-the-shelf visual representations. Surprisingly, we find that the early layers in an ImageNet pre-trained ResNet model could provide rather generalizable representations for visual RL. Hence, we propose Pre-trained Image Encoder for Generalizable visual reinforcement learning (PIE-G), a simple yet effective framework that can generalize to the unseen visual scenarios in a zero-shot manner. Extensive experiments are conducted on DMControl Generalization Benchmark, DMControl Manipulation Tasks, Drawer World, and CARLA to verify the effectiveness of PIE-G. Empirical evidence suggests PIE-G improves sample efficiency and significantly outperforms previous state-of-the-art methods in terms of generalization performance. In particular, PIE-G boasts a 55% generalization performance gain on average in the challenging video background setting. Project Page: https: //sites. google. com/view/pie-g/home.

NeurIPS Conference 2022 Conference Paper

Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions

  • Weirui Ye
  • Pieter Abbeel
  • Yang Gao

One of the most important AI research questions is to trade off computation versus performance since ``perfect rationality" exists in theory but is impossible to achieve in practice. Recently, Monte-Carlo tree search (MCTS) has attracted considerable attention due to the significant performance improvement in various challenging domains. However, the expensive time cost during search severely restricts its scope for applications. This paper proposes the Virtual MCTS (V-MCTS), a variant of MCTS that spends more search time on harder states and less search time on simpler states adaptively. We give theoretical bounds of the proposed method and evaluate the performance and computations on $9 \times 9$ Go board games and Atari games. Experiments show that our method can achieve comparable performances to the original search algorithm while requiring less than $50\%$ search time on average. We believe that this approach is a viable alternative for tasks under limited time and resources. The code is available at \url{https: //github. com/YeWR/V-MCTS. git}.

IJCAI Conference 2022 Conference Paper

Stage-wise Stylistic Headline Generation: Style Generation and Summarized Content Insertion

  • Jiaao Zhan
  • Yang Gao
  • Yu Bai
  • Qianhui Liu

A quality headline with a high click-rate should not only summarize the content of an article, but also reflect a style that attracts users. Such demand has drawn rising attention to the task of stylistic headline generation (SHG). An intuitive method is to first generate plain headlines leveraged by document-headline parallel data then transfer them to a target style. However, this inevitably suffers from error propagation. Therefore, to unify the two sub-tasks and explicitly decompose style-relevant attributes and summarize content, we propose an end-to-end stage-wise SHG model containing the style generation component and the content insertion component, where the former generates stylistic-relevant intermediate outputs and the latter receives these outputs then inserts the summarized content. The intermediate outputs are observable, making the style generation easy to control. Our system is comprehensively evaluated by both quantitative and qualitative metrics, and it achieves state-of-the-art results in SHG over three different stylistic datasets.

YNIMG Journal 2021 Journal Article

Accelerating quantitative susceptibility and R2* mapping using incoherent undersampling and deep neural network reconstruction

  • Yang Gao
  • Martijn Cloos
  • Feng Liu
  • Stuart Crozier
  • G. Bruce Pike
  • Hongfu Sun

Quantitative susceptibility mapping (QSM) and R2* mapping are MRI post-processing methods that quantify tissue magnetic susceptibility and transverse relaxation rate distributions. However, QSM and R2* acquisitions are relatively slow, even with parallel imaging. Incoherent undersampling and compressed sensing reconstruction techniques have been used to accelerate traditional magnitude-based MRI acquisitions; however, most do not recover the full phase signal, as required by QSM, due to its non-convex nature. In this study, a learning-based Deep Complex Residual Network (DCRNet) is proposed to recover both the magnitude and phase images from incoherently undersampled data, enabling high acceleration of QSM and R2* acquisition. Magnitude, phase, R2*, and QSM results from DCRNet were compared with two iterative and one deep learning methods on retrospectively undersampled acquisitions from six healthy volunteers, one intracranial hemorrhage and one multiple sclerosis patients, as well as one prospectively undersampled healthy subject using a 7T scanner. Peak signal to noise ratio (PSNR), structural similarity (SSIM), root-mean-squared error (RMSE), and region-of-interest susceptibility and R2* measurements are reported for numerical comparisons. The proposed DCRNet method substantially reduced artifacts and blurring compared to the other methods and resulted in the highest PSNR, SSIM, and RMSE on the magnitude, R2*, local field, and susceptibility maps. Compared to two iterative and one deep learning methods, the DCRNet method demonstrated a 3.2% to 9.1% accuracy improvement in deep grey matter susceptibility when accelerated by a factor of four. The DCRNet also dramatically shortened the reconstruction time of single 2D brain images from 36-140 seconds using conventional approaches to only 15-70 milliseconds.

NeurIPS Conference 2021 Conference Paper

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

  • Lulu Zheng
  • Jiarui Chen
  • Jianhao Wang
  • Jiamin He
  • Yujing Hu
  • Yingfeng Chen
  • Changjie Fan
  • Yang Gao

Efficient exploration in deep cooperative multi-agent reinforcement learning (MARL) still remains challenging in complex coordination problems. In this paper, we introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We leverage an insight of popular factorized MARL algorithms that the ``induced" individual Q-values, i. e. , the individual utility functions used for local execution, are the embeddings of local action-observation histories, and can capture the interaction between agents due to reward backpropagation during centralized training. Therefore, we use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training. As the dynamics of an agent's individual Q-value function captures the novelty of states and the influence from other agents, our intrinsic reward can induce coordinated exploration to new or promising states. We illustrate the advantages of our method by didactic examples, and demonstrate its significant outperformance over state-of-the-art MARL baselines on challenging tasks in the StarCraft II micromanagement benchmark.

AAAI Conference 2021 Conference Paper

ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation

  • Sicheng Zhao
  • Yezhen Wang
  • Bo Li
  • Bichen Wu
  • Yang Gao
  • Pengfei Xu
  • Trevor Darrell
  • Kurt Keutzer

Due to its robust and precise distance measurements, Li- DAR plays an important role in scene understanding for autonomous driving. Training deep neural networks (DNNs) on LiDAR data requires large-scale point-wise annotations, which are time-consuming and expensive to obtain. Instead, simulation-to-real domain adaptation (SRDA) trains a DNN using unlimited synthetic data with automatically generated labels and transfers the learned model to real scenarios. Existing SRDA methods for LiDAR point cloud segmentation mainly employ a multi-stage pipeline and focus on featurelevel alignment. They require prior knowledge of real-world statistics and ignore the pixel-level dropout noise gap and the spatial feature gap between different domains. In this paper, we propose a novel end-to-end framework, named ePointDA, to address the above issues. Specifically, ePointDA consists of three modules: self-supervised dropout noise rendering, statistics-invariant and spatially-adaptive feature alignment, and transferable segmentation learning. The joint optimization enables ePointDA to bridge the domain shift at the pixel-level by explicitly rendering dropout noise for synthetic LiDAR and at the feature-level by spatially aligning the features between different domains, without requiring the real-world statistics. Extensive experiments adapting from synthetic GTA-LiDAR to real KITTI and SemanticKITTI demonstrate the superiority of ePointDA for LiDAR point cloud segmentation.

AAAI Conference 2021 Conference Paper

Exploring Explainable Selection to Control Abstractive Summarization

  • Haonan Wang
  • Yang Gao
  • Yu Bai
  • Mirella Lapata
  • Heyan Huang

Like humans, document summarization models can interpret a document’s contents in a number of ways. Unfortunately, the neural models of today are largely black boxes that provide little explanation of how or why they generated a summary in the way they did. Therefore, to begin prying open the black box and to inject a level of control into the substance of the final summary, we developed a novel select-and-generate framework that focuses on explainability. By revealing the latent centrality and interactions between sentences, along with scores for sentence novelty and relevance, users are given a window into the choices a model is making and an opportunity to guide those choices in a more desirable direction. A novel pair-wise matrix captures the sentence interactions, centrality and attribute scores, and a mask with tunable attribute thresholds allows the user to control which sentences are likely to be included in the extraction. A sentence-deployed attention mechanism in the abstractor ensures the final summary emphasizes the desired content. Additionally, the encoder is adaptable, supporting both Transformer- and BERTbased configurations. In a series of experiments assessed with ROUGE metrics and two human evaluations, ESCA outperformed eight state-of-the-art models on the CNN/DailyMail and NYT50 benchmark datasets.

AIIM Journal 2021 Journal Article

Interactive medical image segmentation via a point-based interaction

  • Jian Zhang
  • Yinghuan Shi
  • Jinquan Sun
  • Lei Wang
  • Luping Zhou
  • Yang Gao
  • Dinggang Shen

Due to low tissue contrast, irregular shape, and large location variance, segmenting the objects from different medical imaging modalities (e. g. , CT, MR) is considered as an important yet challenging task. In this paper, a novel method is presented for interactive medical image segmentation with the following merits. (1) Its design is fundamentally different from previous pure patch-based and image-based segmentation methods. It is observed that during delineation, the physician repeatedly check the intensity from area inside-object to outside-object to determine the boundary, which indicates that comparison in an inside-out manner is extremely important. Thus, the method innovatively models the segmentation task as learning the representation of bi-directional sequential patches, starting from (or ending in) the given central point of the object. This can be realized by the proposed ConvRNN network embedded with a gated memory propagation unit. (2) Unlike previous interactive methods (requiring bounding box or seed points), the proposed method only asks the physician to merely click on the rough central point of the object before segmentation, which could simultaneously enhance the performance and reduce the segmentation time. (3) The method is utilized in a multi-level framework for better performance. It has been systematically evaluated in three different segmentation tasks, including CT kidney tumor, MR prostate, and PROMISE12 challenge, showing promising results compared with state-of-the-art methods.

JBHI Journal 2021 Journal Article

Learning-Based Computer-Aided Prescription Model for Parkinson's Disease: A Data-Driven Perspective

  • Yinghuan Shi
  • Wanqi Yang
  • Kim-Han Thung
  • Hao Wang
  • Yang Gao
  • Yang Pan
  • Li Zhang
  • Dinggang Shen

In this article, we study a novel problem: “automatic prescription recommendation for PD patients. ” To realize this goal, we first build a dataset by collecting 1) symptoms of PD patients, and 2) their prescription drug provided by neurologists. Then, we build a novel computer-aided prescription model by learning the relation between observed symptoms and prescription drug. Finally, for the new coming patients, we could recommend (predict) suitable prescription drug on their observed symptoms by our prescription model. From the methodology part, our proposed model, namely Prescription viA Learning lAtent Symptoms (PALAS), could recommend prescription using the multi-modality representation of the data. In PALAS, a latent symptom space is learned to better model the relationship between symptoms and prescription drug, as there is a large semantic gap between them. Moreover, we present an efficient alternating optimization method for PALAS. We evaluated our method using the data collected from 136 PD patients at Nanjing Brain Hospital, which can be regarded as a large dataset in PD research community. The experimental results demonstrate the effectiveness and clinical potential of our method in this recommendation task, if compared with other competing methods.

NeurIPS Conference 2021 Conference Paper

Mastering Atari Games with Limited Data

  • Weirui Ye
  • Shaohuai Liu
  • Thanard Kurutach
  • Pieter Abbeel
  • Yang Gao

Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal. We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 194. 3% mean human performance and 109. 0% median performance on the Atari 100k benchmark with only two hours of real-time game experience and outperforms the state SAC in some tasks on the DMControl 100k benchmark. This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero's performance is also close to DQN's performance at 200 million frames while we consume 500 times less data. EfficientZero's low sample complexity and high performance can bring RL closer to real-world applicability. We implement our algorithm in an easy-to-understand manner and it is available at https: //github. com/YeWR/EfficientZero. We hope it will accelerate the research of MCTS-based RL algorithms in the wider community.

NeurIPS Conference 2021 Conference Paper

Reinforcement Learning with Latent Flow

  • Wenling Shang
  • Xiaofei Wang
  • Aravind Srinivas
  • Aravind Rajeswaran
  • Yang Gao
  • Pieter Abbeel
  • Misha Laskin

Temporal information is essential to learning effective policies with Reinforcement Learning (RL). However, current state-of-the-art RL algorithms either assume that such information is given as part of the state space or, when learning from pixels, use the simple heuristic of frame-stacking to implicitly capture temporal information present in the image observations. This heuristic is in contrast to the current paradigm in video classification architectures, which utilize explicit encodings of temporal information through methods such as optical flow and two-stream architectures to achieve state-of-the-art performance. Inspired by leading video classification architectures, we introduce the Flow of Latents for Reinforcement Learning (Flare), a network architecture for RL that explicitly encodes temporal information through latent vector differences. We show that Flare recovers optimal performance in state-based RL without explicit access to the state velocity, solely with positional state information. Flare is the most sample efficient model-free pixel-based RL algorithm on the DeepMind Control suite when evaluated on the 500k and 1M step benchmarks across 5 challenging control tasks, and, when used with Rainbow DQN, outperforms the competitive baseline on Atari games at 100M time step benchmark across 8 challenging games.

AAAI Conference 2021 Conference Paper

Single View Point Cloud Generation via Unified 3D Prototype

  • Yu Lin
  • Yigong Wang
  • Yi-Fan Li
  • Zhuoyi Wang
  • Yang Gao
  • Latifur Khan

As 3D point clouds become the representation of choice for multiple vision and graphics applications, such as autonomous driving, robotics, etc. , the generation of them by deep neural networks has attracted increasing attention in the research community. Despite the recent success of deep learning models in classification and segmentation, synthesizing point clouds remains challenging, especially from a single image. State-of-the-art (SOTA) approaches can generate a point cloud from a hidden vector, however, they treat 2D and 3D features equally and disregard the rich shape information within the 3D data. In this paper, we address this problem by integrating image features with 3D prototype features. Specifically, we propose to learn a set of 3D prototype features from a real point cloud dataset and dynamically adjust them through the training. These prototypes are then integrated with incoming image features to guide the point cloud generation process. Experimental results show that our proposed method outperforms SOTA methods on single image based 3D reconstruction tasks.

YNIMG Journal 2020 Journal Article

A 16-channel AC/DC array coil for anesthetized monkey whole-brain imaging at 7T

  • Yang Gao
  • Azma Mareyam
  • Yi Sun
  • Thomas Witzel
  • Nicolas Arango
  • Irene Kuang
  • Jacob White
  • Anna Wang Roe

Functional magnetic resonance imaging (fMRI) in monkeys is important for bridging the gap between invasive animal brain studies and non-invasive human brain studies. To resolve the finer functional structure of the monkey brain, ultra-high-field (UHF) MR is essential, and high-performance, close-fitting RF receive coils are typically desired to fully leverage the intrinsic gains provided by UHF MRI. Moreover, static field (B0) inhomogeneity arising from the tissue susceptibility interface is more severe at UHF, presenting an obstacle to achieving high-resolution fMRI. B0 shim of the monkey head is challenging due to its smaller size and more complex sources of B0 offsets in multi-modal imaging tasks. In the present work, we have customized an array coil for lightly-anesthetized monkey fMRI in the 7T human scanner that combines RF and multi-coil (MC) B0 shim functionality (also referred to as AC/DC coils) to provide high imaging SNR and high-spatial-order, rapidly switchable B0-shim capability. Additional space was retained on the coil to render it compatible with monkey multi-modal imaging studies. Both MC global (whole-volume) and dynamic (slice-optimized) shim methods were tested and evaluated, and the benefits of MC shim for fMRI experiments was also studied. A minor reduction in RF coil performance was found after introducing additional B0 shim circuitry. However, the proposed RF coil provided higher image SNR and more uniform contrast compared to a commercially available coil for human knee imaging. Compared with static 2nd-order shim, the B0 inhomogeneity was reduced by 56.8%, and 95-percentile B0 offset was reduced to within 28.2 Hz through MC shim, versus 68.7 Hz with 2nd-order static shim. As a result, functional image quality could be improved, and brain activation can be better detected using the proposed AC/DC monkey coil.

TIST Journal 2020 Journal Article

A Discriminative Convolutional Neural Network with Context-aware Attention

  • Yuxiang Zhou
  • Lejian Liao
  • Yang Gao
  • Heyan Huang
  • Xiaochi Wei

Feature representation and feature extraction are two crucial procedures in text mining. Convolutional Neural Networks (CNN) have shown overwhelming success for text-mining tasks, since they are capable of efficiently extracting n -gram features from source data. However, vanilla CNN has its own weaknesses on feature representation and feature extraction. A certain amount of filters in CNN are inevitably duplicate and thus hinder to discriminatively represent a given text. In addition, most existing CNN models extract features in a fixed way (i.e., max pooling) that either limit the CNN to local optimum nor without considering the relation between all features, thereby unable to learn a contextual n -gram features adaptively. In this article, we propose a discriminative CNN with context-aware attention to solve the challenges of vanilla CNN. Specifically, our model mainly encourages discrimination across different filters via maximizing their earth mover distances and estimates the salience of feature candidates by considering the relation between context features. We validate carefully our findings against baselines on five benchmark datasets of classification and two datasets of summarization. The results of the experiments verify the competitive performance of our proposed model.

EAAI Journal 2020 Journal Article

A novel target threat assessment method based on three-way decisions under intuitionistic fuzzy multi-attribute decision making environment

  • Yang Gao
  • Dong-sheng Li
  • Hua Zhong

Target threat assessment aims to rank targets threat based on their attributes and state information, which provide decision support for subsequent military decisions, e. g. weapon-target optimal assignment. Most existing threat assessment methods can only obtain ranking results, decision-makers usually need to subjectively choose priority targets to attack or interfere based on the preset threat level and ordering results, which does not meet the requirements of complex battlefield situation and uncertain information processing. A method is urgently needed, which can objectively produce threat classification results and automatically provide priority targets for combat. Therefore, we propose a novel target threat assessment method based on three-way decisions under intuitionistic fuzzy multi-attribute decision making environment. The core parts are the conditional probability of each target is estimated by intuitionistic fuzzy TOPSIS and the decision thresholds of each target are constructed by intuitionistic fuzzy evaluation values. The results of two numerical examples show that the proposed method can effectively deal with dynamic uncertain situation information, turn the traditional ranking results of two-way decisions to the objective classification results of three-way decisions and can flexibly reflect the acquisition of situation information by setting the risk avoidance coefficient.

JBHI Journal 2020 Journal Article

An Effective MR-Guided CT Network Training for Segmenting Prostate in CT Images

  • Wanqi Yang
  • Yinghuan Shi
  • Sang Hyun Park
  • Ming Yang
  • Yang Gao
  • Dinggang Shen

Segmentation of prostate in medical imaging data (e. g. , CT, MRI, TRUS) is often considered as a critical yet challenging task for radiotherapy treatment. It is relatively easier to segment prostate from MR images than from CT images, due to better soft tissue contrast of the MR images. For segmenting prostate from CT images, most previous methods mainly used CT alone, and thus their performances are often limited by low tissue contrast in the CT images. In this article, we explore the possibility of using indirect guidance from MR images for improving prostate segmentation in the CT images. In particular, we propose a novel deep transfer learning approach, i. e. , MR-guided CT network training (namely MICS-NET), which can employ MR images to help better learning of features in CT images for prostate segmentation. In MICS-NET, the guidance from MRI consists of two steps: (1) learning informative and transferable features from MRI and then transferring them to CT images in a cascade manner, and (2) adaptively transferring the prostate likelihood of MRI model (i. e. , well-trained convnet by purely using MR images) with a view consistency constraint. To illustrate the effectiveness of our approach, we evaluate MICS-NET on a real CT prostate image set, with the manual delineations available as the ground truth for evaluation. Our methods generate promising segmentation results which achieve (1) six percentages higher Dice Ratio than the CT model purely using CT images and (2) comparable performance with the MRI model purely using MR images.

IJCAI Conference 2020 Conference Paper

Asymmetric Distribution Measure for Few-shot Learning

  • Wenbin Li
  • Lei Wang
  • Jing Huo
  • Yinghuan Shi
  • Yang Gao
  • Jiebo Luo

The core idea of metric-based few-shot image classification is to directly measure the relations between query images and support classes to learn transferable feature embeddings. Previous work mainly focuses on image-level feature representations, which actually cannot effectively estimate a class's distribution due to the scarcity of samples. Some recent work shows that local descriptor based representations can achieve richer representations than image-level based representations. However, such works are still based on a less effective instance-level metric, especially a symmetric metric, to measure the relation between a query image and a support class. Given the natural asymmetric relation between a query image and a support class, we argue that an asymmetric measure is more suitable for metric-based few-shot learning. To that end, we propose a novel Asymmetric Distribution Measure (ADM) network for few-shot learning by calculating a joint local and global asymmetric measure between two multivariate local distributions of a query and a class. Moreover, a task-aware Contrastive Measure Strategy (CMS) is proposed to further enhance the measure function. On popular miniImageNet and tieredImageNet, ADM can achieve the state-of-the-art results, validating our innovative design of asymmetric distribution measures for few-shot learning. The source code can be downloaded from https: //github. com/WenbinLee/ADM. git.

IJCAI Conference 2020 Conference Paper

Biased Feature Learning for Occlusion Invariant Face Recognition

  • Changbin Shao
  • Jing Huo
  • Lei Qi
  • Zhen-Hua Feng
  • Wenbin Li
  • Chuanqi Dong
  • Yang Gao

To address the challenges posed by unknown occlusions, we propose a Biased Feature Learning (BFL) framework for occlusion-invariant face recognition. We first construct an extended dataset using a multi-scale data augmentation method. For model training, we modify the label loss to adjust the impact of normal and occluded samples. Further, we propose a biased guidance strategy to manipulate the optimization of a network so that the feature embedding space is dominated by non-occluded faces. BFL not only enhances the robustness of a network to unknown occlusions but also maintains or even improves its performance for normal faces. Experimental results demonstrate its superiority as well as the generalization capability with different network architectures and loss functions.

IJCAI Conference 2020 Conference Paper

Consistent MetaReg: Alleviating Intra-task Discrepancy for Better Meta-knowledge

  • Pinzhuo Tian
  • Lei Qi
  • Shaokang Dong
  • Yinghuan Shi
  • Yang Gao

In the few-shot learning scenario, the data-distribution discrepancy between training data and test data in a task usually exists due to the limited data. However, most existing meta-learning approaches seldom consider this intra-task discrepancy in the meta-training phase which might deteriorate the performance. To overcome this limitation, we develop a new consistent meta-regularization method to reduce the intra-task data-distribution discrepancy. Moreover, the proposed meta-regularization method could be readily inserted into existing optimization-based meta-learning models to learn better meta-knowledge. Particularly, we provide the theoretical analysis to prove that using the proposed meta-regularization, the conventional gradient-based meta-learning method can reach the lower regret bound. The extensive experiments also demonstrate the effectiveness of our method, which indeed improves the performances of the state-of-the-art gradient-based meta-learning models in the few-shot classification task.

IS Journal 2020 Journal Article

Contextual Bandits With Hidden Features to Online Recommendation via Sparse Interactions

  • Shangdong Yang
  • Hao Wang
  • Chenyu Zhang
  • Yang Gao

Online recommendation is an important feature in many applications. In practice, the interaction between the users and the recommender system might be sparse, i. e. , the users are not always interacting with the recommender system. For example, some users prefer to sweep around the recommendation instead of clicking into the details. Therefore, a response of zero may not necessarily be a negative response, but a nonresponse. It comes worse to distinguish these two situations when only one item is recommended to the user each time and few further information is reachable. Most existing recommendation strategies ignore the difference between nonresponses and negative responses. In this article, we propose a novel approach to make online recommendations via sparse interactions. We design a contextual bandit algorithm, named hSAOR, for online recommendation. Our method makes probabilistic estimations on whether the user is interacting or not, by reasonably assuming that similar items are similarly attractive. It uses positive and negative responses to build the user preference model, ignoring all nonresponses. Theoretical analyses and experimental results demonstrate its effectiveness.

AAAI Conference 2020 Conference Paper

Differentiable Meta-Learning Model for Few-Shot Semantic Segmentation

  • Pinzhuo Tian
  • Zhangkai Wu
  • Lei Qi
  • Lei Wang
  • Yinghuan Shi
  • Yang Gao

To address the annotation scarcity issue in some cases of semantic segmentation, there have been a few attempts to develop the segmentation model in the few-shot learning paradigm. However, most existing methods only focus on the traditional 1-way segmentation setting (i. e. , one image only contains a single object). This is far away from practical semantic segmentation tasks where the K-way setting (K >1) is usually required by performing the accurate multi-object segmentation. To deal with this issue, we formulate the fewshot semantic segmentation task as a learning-based pixel classification problem, and propose a novel framework called MetaSegNet based on meta-learning. In MetaSegNet, an architecture of embedding module consisting of the global and local feature branches is developed to extract the appropriate meta-knowledge for the few-shot segmentation. Moreover, we incorporate a linear model into MetaSegNet as a base learner to directly predict the label of each pixel for the multiobject segmentation. Furthermore, our MetaSegNet can be trained by the episodic training mechanism in an end-to-end manner from scratch. Experiments on two popular semantic segmentation datasets, i. e. , PASCAL VOC and COCO, reveal the effectiveness of the proposed MetaSegNet in the K-way few-shot semantic segmentation task.

NeurIPS Conference 2020 Conference Paper

Fighting Copycat Agents in Behavioral Cloning from Observation Histories

  • Chuan Wen
  • Jierui Lin
  • Trevor Darrell
  • Dinesh Jayaraman
  • Yang Gao

Imitation learning trains policies to map from input observations to the actions that an expert would choose. In this setting, distribution shift frequently exacerbates the effect of misattributing expert actions to nuisance correlates among the observed variables. We observe that a common instance of this causal confusion occurs in partially observed settings when expert actions are strongly correlated over time: the imitator learns to cheat by predicting the expert's previous action, rather than the next action. To combat this "copycat problem", we propose an adversarial approach to learn a feature representation that removes excess information about the previous expert action nuisance correlate, while retaining the information necessary to predict the next action. In our experiments, our approach improves performance significantly across a variety of partially observed imitation learning tasks.

AAAI Conference 2020 Conference Paper

From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning

  • Weixun Wang
  • Tianpei Yang
  • Yong Liu
  • Jianye Hao
  • Xiaotian Hao
  • Yujing Hu
  • Yingfeng Chen
  • Changjie Fan

A lot of efforts have been devoted to investigating how agents can learn effectively and achieve coordination in multiagent systems. However, it is still challenging in large-scale multiagent settings due to the complex dynamics between the environment and agents and the explosion of state-action space. In this paper, we design a novel Dynamic Multiagent Curriculum Learning (DyMA-CL) to solve large-scale problems by starting from learning on a multiagent scenario with a small size and progressively increasing the number of agents. We propose three transfer mechanisms across curricula to accelerate the learning process. Moreover, due to the fact that the state dimension varies across curricula, and existing network structures cannot be applied in such a transfer setting since their network input sizes are fixed. Therefore, we design a novel network structure called Dynamic Agent-number Network (DyAN) to handle the dynamic size of the network input. Experimental results show that DyMA-CL using DyAN greatly improves the performance of large-scale multiagent learning compared with state-of-the-art deep reinforcement learning approaches. We also investigate the influence of three transfer mechanisms across curricula through extensive simulations.

IJCAI Conference 2020 Conference Paper

Graph Neural Architecture Search

  • Yang Gao
  • Hong Yang
  • Peng Zhang
  • Chuan Zhou
  • Yue Hu

Graph neural networks (GNNs) emerged recently as a powerful tool for analyzing non-Euclidean data such as social network data. Despite their success, the design of graph neural networks requires heavy manual work and domain knowledge. In this paper, we present a graph neural architecture search method (GraphNAS) that enables automatic design of the best graph neural architecture based on reinforcement learning. Specifically, GraphNAS uses a recurrent network to generate variable-length strings that describe the architectures of graph neural networks, and trains the recurrent network with policy gradient to maximize the expected accuracy of the generated architectures on a validation data set. Furthermore, to improve the search efficiency of GraphNAS on big networks, GraphNAS restricts the search space from an entire architecture space to a sequential concatenation of the best search results built on each single architecture layer. Experiments on real-world datasets demonstrate that GraphNAS can design a novel network architecture that rivals the best human-invented architecture in terms of validation set accuracy. Moreover, in a transfer learning task we observe that graph neural architectures designed by GraphNAS, when transferred to new datasets, still gain improvement in terms of prediction accuracy.

AAAI Conference 2020 Conference Paper

Layerwise Sparse Coding for Pruned Deep Neural Networks with Extreme Compression Ratio

  • Xiao Liu
  • Wenbin Li
  • Jing Huo
  • Lili Yao
  • Yang Gao

Deep neural network compression is important and increasingly developed especially in resource-constrained environments, such as autonomous drones and wearable devices. Basically, we can easily and largely reduce the number of weights of a trained deep model by adopting a widely used model compression technique, e. g. , pruning. In this way, two kinds of data are usually preserved for this compressed model, i. e. , non-zero weights and meta-data, where metadata is employed to help encode and decode these non-zero weights. Although we can obtain an ideally small number of non-zero weights through pruning, existing sparse matrix coding methods still need a much larger amount of meta-data (may several times larger than non-zero weights), which will be a severe bottleneck of the deploying of very deep models. To tackle this issue, we propose a layerwise sparse coding (LSC) method to maximize the compression ratio by extremely reducing the amount of meta-data. We first divide a sparse matrix into multiple small blocks and remove zero blocks, and then propose a novel signed relative index (SRI) algorithm to encode the remaining non-zero blocks (with much less meta-data). In addition, the proposed LSC performs parallel matrix multiplication without full decoding, while traditional methods cannot. Through extensive experiments, we demonstrate that LSC achieves substantial gains in pruned DNN compression (e. g. , 51. 03x compression ratio on ADMM-Lenet) and inference computation (i. e. , time reduction and extremely less memory bandwidth), over stateof-the-art baselines.

IJCAI Conference 2020 Conference Paper

Learning Task-aware Local Representations for Few-shot Learning

  • Chuanqi Dong
  • Wenbin Li
  • Jing Huo
  • Zheng Gu
  • Yang Gao

Few-shot learning for visual recognition aims to adapt to novel unseen classes with only a few images. Recent work, especially the work based on low-level information, has achieved great progress. In these work, local representations (LRs) are typically employed, because LRs are more consistent among the seen and unseen classes. However, most of them are limited to an individual image-to-image or image-to-class measure manner, which cannot fully exploit the capabilities of LRs, especially in the context of a certain task. This paper proposes an Adaptive Task-aware Local Representations Network (ATL-Net) to address this limitation by introducing episodic attention, which can adaptively select the important local patches among the entire task, as the process of human recognition. We achieve much superior results on multiple benchmarks. On the miniImagenet, ATL-Net gains 0. 93% and 0. 88% improvements over the compared methods under the 5-way 1-shot and 5-shot settings. Moreover, ATL-Net can naturally tackle the problem that how to adaptively identify and weight the importance of different key local parts, which is the major concern of fine-grained recognition. Specifically, on the fine-grained dataset Stanford Dogs, ATL-Net outperforms the second best method with 5. 39% and 9. 69% gains under the 5-way 1-shot and 5-shot settings.

AAAI Conference 2020 Conference Paper

Multi-Agent Game Abstraction via Graph Attention Neural Network

  • Yong Liu
  • Weixun Wang
  • Yujing Hu
  • Jianye Hao
  • Xingguo Chen
  • Yang Gao

In large-scale multi-agent systems, the large number of agents and complex game relationship cause great difficulty for policy learning. Therefore, simplifying the learning process is an important research issue. In many multi-agent systems, the interactions between agents often happen locally, which means that agents neither need to coordinate with all other agents nor need to coordinate with others all the time. Traditional methods attempt to use pre-defined rules to capture the interaction relationship between agents. However, the methods cannot be directly used in a large-scale environment due to the difficulty of transforming the complex interactions between agents into rules. In this paper, we model the relationship between agents by a complete graph and propose a novel game abstraction mechanism based on two-stage attention network (G2ANet), which can indicate whether there is an interaction between two agents and the importance of the interaction. We integrate this detection mechanism into graph neural network-based multi-agent reinforcement learning for conducting game abstraction and propose two novel learning algorithms GA-Comm and GA-AC. We conduct experiments in Traffic Junction and Predator-Prey. The results indicate that the proposed methods can simplify the learning process and meanwhile get better asymptotic performance compared with state-of-the-art algorithms.

JAAMAS Journal 2019 Journal Article

A probabilistic argumentation framework for reinforcement learning agents

  • Régis Riveret
  • Yang Gao
  • Giovanni Sartor

Abstract A bounded-reasoning agent may face two dimensions of uncertainty: firstly, the uncertainty arising from partial information and conflicting reasons, and secondly, the uncertainty arising from the stochastic nature of its actions and the environment. This paper attempts to address both dimensions within a single unified framework, by bringing together probabilistic argumentation and reinforcement learning. We show how a probabilistic rule-based argumentation framework can capture Markov decision processes and reinforcement learning agents; and how the framework allows us to characterise agents and their argument-based motivations from both a logic-based perspective and a probabilistic perspective. We advocate and illustrate the use of our approach to capture models of agency and norms, and argue that, in addition to providing a novel method for investigating agent types, the unified framework offers a sound basis for taking a mentalistic approach to agent profiles.

AAAI Conference 2019 Conference Paper

Distribution Consistency Based Covariance Metric Networks for Few-Shot Learning

  • Wenbin Li
  • Jinglin Xu
  • Jing Huo
  • Lei Wang
  • Yang Gao
  • Jiebo Luo

Few-shot learning aims to recognize new concepts from very few examples. However, most of the existing few-shot learning methods mainly concentrate on the first-order statistic of concept representation or a fixed metric on the relation between a sample and a concept. In this work, we propose a novel end-to-end deep architecture, named Covariance Metric Networks (CovaMNet). The CovaMNet is designed to exploit both the covariance representation and covariance metric based on the distribution consistency for the few-shot classification tasks. Specifically, we construct an embedded local covariance representation to extract the second-order statistic information of each concept and describe the underlying distribution of this concept. Upon the covariance representation, we further define a new deep covariance metric to measure the consistency of distributions between query samples and new concepts. Furthermore, we employ the episodic training mechanism to train the entire network in an end-to-end manner from scratch. Extensive experiments in two tasks, generic few-shot image classification and fine-grained fewshot image classification, demonstrate the superiority of the proposed CovaMNet. The source code can be available from https: //github. com/WenbinLee/CovaMNet. git.

AAAI Conference 2019 Conference Paper

Multistream Classification with Relative Density Ratio Estimation

  • Bo Dong
  • Yang Gao
  • Swarup Chandra
  • Latifur Khan

In supervised learning, availability of sufficient labeled data is of prime importance. Unfortunately, they are sparingly available in many real-world applications. Particularly when performing classification over a non-stationary data stream, unavailability of sufficient labeled data undermines the classifier’s long-term performance by limiting its adaptability to changes in data distribution over time. Recently, studies in such settings have appealed to transfer learning techniques over a data stream while detecting drifts in data distribution over time. Here, the data stream is represented by two independent non-stationary streams, one containing labeled data instances (called source stream) having a biased distribution compared to the unlabeled data instances (called target stream). The task of label prediction under this representation is called Multistream Classification, where instances in the two streams occur independently. While these studies have addressed various challenges in the multistream setting, it still suffers from large computational overhead mainly due to frequent bias correction and drift adaptation methods employed. In this paper, we focus on utilizing an alternative bias correction technique, called relative density-ratio estimation, which is known to be computationally faster. Importantly, we propose a novel mechanism to automatically learn an appropriate mixture of relative density that adapts to changes in the multistream setting over time. We theoretically study its properties and empirically demonstrate its superior performance, within a multistream framework called MSCRDR, on benchmark datasets by comparing with other competing methods.

IJCAI Conference 2019 Conference Paper

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

  • Yang Gao
  • Christian M. Meyer
  • Mohsen Mesgar
  • Iryna Gurevych

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative, but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

ICRA Conference 2019 Conference Paper

Risk Averse Robust Adversarial Reinforcement Learning

  • Xinlei Pan
  • Daniel Seita
  • Yang Gao
  • John F. Canny

Deep reinforcement learning has recently made significant progress in solving computer games and robotic control tasks. A known problem, though, is that policies overfit to the training environment and may not avoid rare, catastrophic events such as automotive accidents. A classical technique for improving the robustness of reinforcement learning algorithms is to train on a set of randomized environments, but this approach only guards against common situations. Recently, robust adversarial reinforcement learning (RARL) was developed, which allows efficient applications of random and systematic perturbations by a trained adversary. A limitation of RARL is that only the expected control objective is optimized; there is no explicit modeling or optimization of risk. Thus the agents do not consider the probability of catastrophic events (i. e. , those inducing abnormally large negative reward), except through their effect on the expected objective. In this paper we introduce risk-averse robust adversarial reinforcement learning (RARARL), using a risk-averse protagonist and a risk-seeking adversary. We test our approach on a self-driving vehicle controller. We use an ensemble of policy networks to model risk as the variance of value functions. We show through experiments that a risk-averse agent is better equipped to handle a risk-seeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary. Supplementary materials are available at https://sites.google.com/view/rararl.

IJCAI Conference 2019 Conference Paper

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns

  • Yong Liu
  • Yujing Hu
  • Yang Gao
  • Yingfeng Chen
  • Changjie Fan

Many real-world problems, such as robot control and soccer game, are naturally modeled as sparse-interaction multi-agent systems. Reutilizing single-agent knowledge in multi-agent systems with sparse interactions can greatly accelerate the multi-agent learning process. Previous works rely on bisimulation metric to define Markov decision process (MDP) similarity for controlling knowledge transfer. However, bisimulation metric is costly to compute and is not suitable for high-dimensional state space problems. In this work, we propose more scalable transfer learning methods based on a novel MDP similarity concept. We start by defining the MDP similarity based on the N-step return (NSR) values of an MDP. Then, we propose two knowledge transfer methods based on deep neural networks called direct value function transfer and NSR-based value function transfer. We conduct experiments in image-based grid world, multi-agent particle environment (MPE) and Ms. Pac-Man game. The results indicate that the proposed methods can significantly accelerate multi-agent reinforcement learning and meanwhile get better asymptotic performance.

AAMAS Conference 2018 Conference Paper

An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward

  • Shangdong Yang
  • Hao Wang
  • Yang Gao
  • Xingguo Chen

This paper studies a variation of stochastic multi-armed bandit (MAB) problem where the agent knows a prior knowledge named Near-optimal Mean Reward (NoMR). We show that the cumulative regret of this bandit variation has a lower bound of Ω (1/∆), where ∆ is the gap between the optimal and the second optimal mean reward. An algorithm called NoMR-Bandit is proposed to this variation, and we demonstrate that the cumulative regret of NoMR- Bandit has a uniform upper bound of O (∆). It is concluded that NoMR-Bandit is optimal in terms of the order of regret bounds.

IS Journal 2018 Journal Article

Autonomous Nuclear Waste Management

  • Jonathan M. Aitken
  • Sandor M. Veres
  • Affan Shaukat
  • Yang Gao
  • Elisa Cucco
  • Louise A. Dennis
  • Michael Fisher
  • Jeffrey A. Kuo

Redundant and nonoperational buildings at nuclear sites are decommissioned over a period of time. The process involves demolition of physical infrastructure resulting in large quantities of residual waste material. The resulting waste materials are packed into import containers to be delivered for postprocessing, containing either sealed canisters or assortments of miscellaneous objects. At present postprocessing does not happen within the United Kingdom. Sellafield Ltd. and National Nuclear Laboratory are developing a process for future operation so that upon an initial inspection, imported waste materials undergo two stages of postprocessing before being packed into export containers, namely sort and segregate or sort and disrupt. The postprocessing facility will remotely treat and export a wide range of wastes before downstream encapsulation. Certain wastes require additional treatment, such as disruption, before export to ensure suitability for long-term disposal. This paper focuses on the design, development, and demonstration of a reconfigurable rational agent-based robotic system that aims to highly automate these processes removing the need for close human supervision. The proposed system is being demonstrated through a downsized, lab-based setup incorporating a small-scale robotic arm, a time-of-flight camera, and high-level rational agent-based decision making and control framework.

AAAI Conference 2018 Conference Paper

Semantic Structure-Based Word Embedding by Incorporating Concept Convergence and Word Divergence

  • Qian Liu
  • Heyan Huang
  • Guangquan Zhang
  • Yang Gao
  • Junyu Xuan
  • Jie Lu

Representing the semantics of words is a fundamental task in text processing. Several research studies have shown that text and knowledge bases (KBs) are complementary sources for word embedding learning. Most existing methods only consider relationships within word-pairs in the usage of KBs. We argue that the structural information of well-organized words within the KBs is able to convey more effective and stable knowledge in capturing semantics of words. In this paper, we propose a semantic structure-based word embedding method, and introduce concept convergence and word divergence to reveal semantic structures in the word embedding learning process. To assess the effectiveness of our method, we use WordNet for training and conduct extensive experiments on word similarity, word analogy, text classification and query expansion. The experimental results show that our method outperforms state-of-the-art methods, including the methods trained solely on the corpus, and others trained on the corpus and the KBs.

AAAI Conference 2017 Conference Paper

Beyond IID: Learning to Combine Non-IID Metrics for Vision Tasks

  • Yinghuan Shi
  • Wenbin Li
  • Yang Gao
  • Longbing Cao
  • Dinggang Shen

Metric learning has been widely employed, especially in various computer vision tasks, with the fundamental assumption that all samples (e. g. , regions/superpixels in images/videos) are independent and identically distributed (IID). However, since the samples are usually spatially-connected or temporally-correlated with their physically-connected neighbours, they are not IID (non-IID for short), which cannot be directly handled by existing methods. Thus, we propose to learn and integrate non-IID metrics (NIME). To incorporate the non-IID spatial/temporal relations, instead of directly using non-IID features and metric learning as previous methods, NIME first builds several non-IID representations on original (non-IID) features by various graph kernel functions, and then automatically learns the metric under the best combination of various non-IID representations. NIME is applied to solve two typical computer vision tasks: interactive image segmentation and histology image identification. The results show that learning and integrating non-IID metrics improves the performance, compared to the IID methods. Moreover, our method achieves results comparable or better than that of the state-of-the-arts.

TIST Journal 2017 Journal Article

Finding Semantically Valid and Relevant Topics by Association-Based Topic Selection Model

  • Yang Gao
  • Yuefeng Li
  • Raymond Y. K. Lau
  • Yue Xu
  • Md Abul Bashar

Topic modelling methods such as Latent Dirichlet Allocation (LDA) have been successfully applied to various fields, since these methods can effectively characterize document collections by using a mixture of semantically rich topics. So far, many models have been proposed. However, the existing models typically outperform on full analysis on the whole collection to find all topics but difficult to capture coherent and specifically meaningful topic representations. Furthermore, it is very challenging to incorporate user preferences into existing topic modelling methods to extract relevant topics. To address these problems, we develop a novel personalized Association-based Topic Selection (ATS) model, which can identify semantically valid and relevant topics from a set of raw topics based on the semantical relatedness between users’ preferences and the structured patterns captured in topics. The advantage of the proposed ATS model is that it enables an interactive topic modelling process driven by users’ specific interests. Based on three benchmark datasets, namely, RCV1, R8, and WT10G under the context of information filtering (IF) and information retrieval (IR), our rigorous experiments show that the proposed ATS model can effectively identify relevant topics with respect to users’ specific interests, and hence to improve the performance of IF and IR.

AAAI Conference 2017 Conference Paper

Fredholm Multiple Kernel Learning for Semi-Supervised Domain Adaptation

  • Wei Wang
  • Hao Wang
  • Chen Zhang
  • Yang Gao

As a fundamental constituent of machine learning, domain adaptation generalizes a learning model from a source domain to a different (but related) target domain. In this paper, we focus on semi-supervised domain adaptation and explicitly extend the applied range of unlabeled target samples into the combination of distribution alignment and adaptive classifier learning. Specifically, our extension formulates the following aspects in a single optimization: 1) learning a crossdomain predictive model by developing the Fredholm integral based kernel prediction framework; 2) reducing the distribution difference between two domains; 3) exploring multiple kernels to induce an optimal learning space. Correspondingly, such an extension is distinguished with allowing for noise resiliency, facilitating knowledge transfer and analyzing diverse data characteristics. It is emphasized that we prove the differentiability of our formulation and present an effective optimization procedure based on the reduced gradient, guaranteeing rapid convergence. Comprehensive empirical studies verify the effectiveness of the proposed method.

AAMAS Conference 2016 Conference Paper

Argumentation-Based Multi-Agent Decision Making with Privacy Preserved

  • Yang Gao
  • Francesca Toni
  • Hao Wang
  • Fanjiang Xu

We consider multi-agent decision making problems in which agents need to communicate with other agents to make socially optimal decisions but, at the same time, have some private information that they do not want to share. Abstract argumentation has been widely used in both single-agent and multi-agent decision making problems, because of its ability for reasoning with incomplete and conflicting information. In this work, we propose an abstract argumentation-based knowledge representation and communication protocol, such that agents can find socially optimal strategies by only disclosing the ‘necessary’ and ‘disclosable’ information. We prove that our protocol is sound, efficient, of perfect information security and guaranteed to terminate.

AAAI Conference 2016 Conference Paper

Efficient Average Reward Reinforcement Learning Using Constant Shifting Values

  • Shangdong Yang
  • Yang Gao
  • Bo An
  • Hao Wang
  • Xingguo Chen

There are two classes of average reward reinforcement learning (RL) algorithms: model-based ones that explicitly maintain MDP models and model-free ones that do not learn such models. Though model-free algorithms are known to be more efficient, they often cannot converge to optimal policies due to the perturbation of parameters. In this paper, a novel model-free algorithm is proposed, which makes use of constant shifting values (CSVs) estimated from prior knowledge. To encourage exploration during the learning process, the algorithm constantly subtracts the CSV from the rewards. A terminating condition is proposed to handle the unboundedness of Q-values caused by such substraction. The convergence of the proposed algorithm is proved under very mild assumptions. Furthermore, linear function approximation is investigated to generalize our method to handle large-scale tasks. Extensive experiments on representative MDPs and the popular game Tetris show that the proposed algorithms significantly outperform the state-of-the-art ones.

AAMAS Conference 2016 Conference Paper

Measuring the Distance Between Finite Markov Decision Processes

  • Jinhua Song
  • Yang Gao
  • Hao Wang
  • Bo An

Markov decision processes (MDPs) have been studied for many decades. Recent research in using transfer learning methods to solve MDPs has shown that knowledge learned from one MDP may be used to solve a similar MDP better. In this paper, we propose two metrics for measuring the distance between finite MDPs. Our metrics are based on the Hausdorff metric which measures the distance between two subsets of a metric space and the Kantorovich metric for measuring the distance between probabilistic distributions. Our metrics can be used to compute the distance between reinforcement learning tasks that are modeled as MDPs. The second contribution of this paper is that we apply the metrics to direct transfer learning by finding the similar source tasks. Our third contribution is that we propose two knowledge transfer methods which transfer value functions of the selected source tasks to the target task. Extensive experimental results show that our metrics are effective in finding similar tasks and significantly improve the performance of transfer learning with the transfer methods.

JMLR Journal 2015 Journal Article

Multi-layered Gesture Recognition with Kinect

  • Feng Jiang
  • Shengping Zhang
  • Shen Wu
  • Yang Gao
  • Debin Zhao

This paper proposes a novel multi-layered gesture recognition method with Kinect. We explore the essential linguistic characters of gestures: the components concurrent character and the sequential organization character, in a multi-layered framework, which extracts features from both the segmented semantic units and the whole gesture sequence and then sequentially classifies the motion, location and shape components. In the first layer, an improved principle motion is applied to model the motion component. In the second layer, a particle-based descriptor and a weighted dynamic time warping are proposed for the location component classification. In the last layer, the spatial path warping is further proposed to classify the shape component represented by unclosed shape context. The proposed method can obtain relatively high performance for one-shot learning gesture recognition on the ChaLearn Gesture Dataset comprising more than 50, 000 gesture sequences recorded with Kinect. [abs] [ pdf ][ bib ] &copy JMLR 2015. ( edit, beta )

TIST Journal 2015 Journal Article

Peacock

  • Yi Wang
  • Xuemin Zhao
  • Zhenlong Sun
  • Hao Yan
  • Lifeng Wang
  • Zhihui Jin
  • Liubin Wang
  • Yang Gao

Latent Dirichlet allocation (LDA) is a popular topic modeling technique in academia but less so in industry, especially in large-scale applications involving search engine and online advertising systems. A main underlying reason is that the topic models used have been too small in scale to be useful; for example, some of the largest LDA models reported in literature have up to 10 3 topics, which difficultly cover the long-tail semantic word sets. In this article, we show that the number of topics is a key factor that can significantly boost the utility of topic-modeling systems. In particular, we show that a “big” LDA model with at least 10 5 topics inferred from 10 9 search queries can achieve a significant improvement on industrial search engine and online advertising systems, both of which serve hundreds of millions of users. We develop a novel distributed system called Peacock to learn big LDA models from big data. The main features of Peacock include hierarchical distributed architecture, real-time prediction, and topic de-duplication. We empirically demonstrate that the Peacock system is capable of providing significant benefits via highly scalable LDA topic models for several industrial applications.

IJCAI Conference 2015 Conference Paper

Potential Based Reward Shaping for Hierarchical Reinforcement Learning

  • Yang Gao
  • Francesca Toni

Hierarchical Reinforcement Learning (HRL) outperforms many ‘flat’ Reinforcement Learning (RL) algorithms in some application domains. However, HRL may need longer time to obtain the optimal policy because of its large action space. Potential Based Reward Shaping (PBRS) has been widely used to incorporate heuristics into flat RL algorithms so as to reduce their exploration. In this paper, we investigate the integration of PBRS and HRL, and propose a new algorithm: PBRS-MAXQ- 0. We prove that under certain conditions, PBRS- MAXQ-0 is guaranteed to converge. Empirical results show that PBRS-MAXQ-0 significantly outperforms MAXQ-0 given good heuristics, and can converge even when given misleading heuristics.

IS Journal 2014 Journal Article

WaaS: Wisdom as a Service

  • Jianhui Chen
  • Jianhua Ma
  • Ning Zhong
  • Yiyu Yao
  • Jiming Liu
  • Runhe Huang
  • Wenbin Li
  • Zhisheng Huang

An emerging hyper-world encompasses all human activities in a social-cyber-physical space. Its power derives from the Wisdom Web of Things (W2T) cycle, namely, "from things to data, information, knowledge, wisdom, services, humans, and then back to things. "' The W2T cycle leads to a harmonious symbiosis among humans, computers, and things, which can be constructed by large-scale converging of intelligent information technology applications with an open and interoperable architecture. The recent advances in cloud computing, the Internet of Things, Web of Things, Big Data, and other research fields have provided just such an open system architecture with resource sharing and services. The next step is to develop an open and interoperable content architecture with intelligent sharing and services for the organization and transformation in the data, information, knowledge, and wisdom (DIKW) hierarchy. This article introduces wisdom as a service (WaaS), a content architecture based on the pay-as-you-go IT trend. The WaaS infrastructure and the main challenges in WaaS research and applications are discussed. A case study is also described. Relying on cloud computing and big data, WaaS provides a practical approach to realize the W2T cycle in the hyper-world for the coming age of ubiquitous intelligent IT applications.