Arrow Research search

Author name cluster

Jun Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

34 papers
2 author rows

Possible papers

34

AAAI Conference 2026 Conference Paper

ActiShade: Activating Overshadowed Knowledge to Guide Multi-Hop Reasoning in Large Language Models

  • Huipeng Ma
  • Luan Zhang
  • Dandan Song
  • Linmei Hu
  • Yuhang Tian
  • Jun Yang
  • Changzhi Zhou
  • Chenhao Li

In multi-hop reasoning, multi-round retrieval-augmented generation (RAG) methods typically rely on LLM-generated content as the retrieval query. However, these approaches are inherently vulnerable to knowledge overshadowing—a phenomenon where critical information is overshadowed during generation. As a result, the LLM-generated content may be incomplete or inaccurate, leading to irrelevant retrieval and causing error accumulation during the iteration process. To address this challenge, we propose ActiShade, which detects and activates overshadowed knowledge to guide large language models(LLMs) in multi-hop reasoning. Specifically, ActiShade iteratively detects the overshadowed keyphrase in the given query, retrieves documents relevant to both the query and the overshadowed keyphrase, and generates a new query based on the retrieved documents to guide the next-round iteration. By supplementing the overshadowed knowledge during the formulation of next-round queries while minimizing the introduction of irrelevant noise, ActiShade reduces the error accumulation caused by knowledge overshadowing. Extensive experiments show that ActiShade outperforms existing methods across multiple datasets and LLMs.

AAAI Conference 2026 Conference Paper

ChipMind: Retrieval-Augmented Reasoning for Long-Context Circuit Design Specifications

  • Changwen Xing
  • SamZaak Wong
  • Xinlai Wan
  • Yanfeng Lu
  • Mengli Zhang
  • Zebin Ma
  • Lei Qi
  • Zhengxiong Li

While Large Language Models (LLMs) demonstrate immense potential for automating integrated circuit (IC) development, their practical deployment is fundamentally limited by restricted context windows. Existing context-extension methods struggle to achieve effective semantic modeling and thorough multi-hop reasoning over extensive, intricate circuit specifications. To address this, we introduce ChipMind, a novel knowledge graph-augmented reasoning framework specifically designed for lengthy IC specifications. ChipMind first transforms circuit specifications into a domain-specific knowledge graph (ChipKG) through the Circuit Semantic-Aware Knowledge Graph Construction methodology. It then leverages the ChipKG-Augmented Reasoning mechanism, combining information-theoretic adaptive retrieval to dynamically trace logical dependencies with intent-aware semantic filtering to prune irrelevant noise, effectively balancing retrieval completeness and precision. Evaluated on an industrial-scale specification reasoning benchmark, ChipMind significantly outperforms state-of-the-art baselines, achieving an average improvement of 34.59% (up to 72.73%). Our framework bridges a critical gap between academic research and practical industrial deployment of LLM-aided Hardware Design (LAD).

AAAI Conference 2026 Conference Paper

FIXME: Towards End-to-End Benchmarking of LLM-Aided Design Verification

  • Gwok-Waa Wan
  • SamZaak Wong
  • Shengchu Su
  • Chenxu Niu
  • Ning Wang
  • Xinlai Wan
  • Qixiang Chen
  • Mengnv Xing

We introduce FIXME, the first end-to-end and large-scale benchmark for evaluating Large Language Models (LLMs) in hardware design functional verification (FV). Comprising 747 tasks derived from real-world hardware designs, FIXME spans five core FV sub-sets: specification comprehension, reference model generation, testbench generation, assertion design, and RTL debugging. To ensure high data quality, we developed an AI-human collaborative framework for agile data curation and annotation. This process resulted in 25,000 lines of verified RTL, 35,000 lines of enhanced testbenches, and over 1,200 SystemVerilog Assertions. Furthermore, through expert-guided optimization within the multi-agent aided flow, we achieved a remarkable 45.57% improvement in average functional coverage, underscoring the benchmark's robustness. Through evaluation of state-of-the-art LLMs like GPT-4.1, FIXME identifies key limitations and provides actionable insights, advancing the potential of LLM-driven automation in hardware design functional verification.

TMLR Journal 2026 Journal Article

Interference-Aware K-Step Reachable Communication in Multi-Agent Reinforcement Learning

  • Ziyu Cheng
  • Jinsheng Ren
  • Jun Yang
  • Zhouxian Jiang
  • Chenzhihang Li
  • Rongye Shi
  • Bin Liang

Effective communication is pivotal for addressing complex collaborative tasks in multi-agent reinforcement learning (MARL). Yet, limited communication bandwidth and dynamic, intricate environmental topologies present significant challenges in identifying high-value communication partners. Agents must consequently select collaborators under uncertainty, lacking a priori knowledge of which partners can deliver task-critical information. To this end, we propose Interference-Aware $K$-Step Reachable Communication (IA-KRC), a novel framework that enhances cooperation via two core components: (1) a $K$-Step reachability protocol that confines message passing to physically accessible neighbors, and (2) an interference-prediction module that optimizes partner choice by minimizing interference while maximizing utility. Compared to existing methods, IA-KRC enables substantially more persistent and efficient cooperation despite environmental interference. Comprehensive evaluations confirm that IA-KRC achieves superior performance compared to state-of-the-art baselines, while demonstrating enhanced robustness and scalability in complex topological and highly dynamic multi-agent scenarios.

JBHI Journal 2026 Journal Article

Robust Remote Heart Rate Estimation Network Based on Spatial-Temporal-Channel Learning From Facial Videos

  • Jun Yang
  • Chen Zhu
  • Renbiao Wu

Non-contact heart rate detection technology leverages changes in skin color to estimate heart rate, enhancing the convenience of health monitoring, particularly in situations requiring real-time, contact-free observation. However, current video-based methods face various limitations, including restricted feature extraction capabilities, redundant spatial information, and ineffective motion artifact processing. To address these problems, a novel end-to-end heart rate estimation network, Spatial-Temporal-Channel Network (STCNet), is proposed. Firstly, in order to solve the problem of redundant spatial information in current video-based heart rate estimation methods, a spatial attention learning (SAL) unit is designed to highlight the effective information of the facial region. Next, an improved temporal shift module (TSMP) with long-range temporal information perception is proposed. On this basis, A temporal-channel learning (TCL) unit is designed to achieve the interaction of information across different frames’ channels, aiming to address the insufficient capability of existing models in extracting periodic features of heartbeat. Finally, combining the SAL and TCL units, a feature extraction block (FEB) is designed. A feature extraction network is constructed by stacking multiple layers of FEBs to achieve accurate heart rate estimation. Numerous experiments are conducted on the UBFC-rPPG dataset and the PURE dataset to verify the effectiveness and generalization ability of our model. Notably, compared to the state-of-the-art CIN-rPPG, our model achieves a 0. 27 bpm reduction in mean absolute error (MAE) and a 0. 19 bpm reduction in root mean square error (RMSE), in intra-dataset testing on the PURE dataset. Experimental results demonstrate that our proposed model outperforms other mainstream models.

IJCAI Conference 2025 Conference Paper

A Dynamic Knowledge Update-Driven Model with Large Language Models for Fake News Detection

  • Di Jin
  • Jun Yang
  • Xiaobao Wang
  • Junwei Zhang
  • Shuqi Li
  • Dongxiao He

As the Internet and social media evolve rapidly, distinguishing credible news from a vast amount of complex information poses a significant challenge. Due to the suddenness and instability of news events, the authenticity labels of news can potentially shift as events develop, making it crucial for fake news detection to obtain the latest event updates. Existing methods employ retrieval-augmented generation to fill knowledge gaps, but they suffer from issues such as insufficient credibility of retrieved content and interference from noisy information. We propose a dynamic knowledge update-driven model for fake news detection (DYNAMO), which leverages knowledge graphs to achieve continuous updating of new knowledge and integrates with large language models to fulfill dual functions: news authenticity detection and verification of new knowledge correctness, solving the two key problems of ensuring the authenticity of new knowledge and deeply mining news semantics. Specifically, we first construct a news-domain-specific knowledge graph. Then, we use Monte Carlo Tree Search to decompose complex news and verify them step by step. Finally, we extract and update new knowledge from verified real news texts and reasoning paths. Experimental results demonstrate that DYNAMO achieves the best performance on two real-world datasets.

ICRA Conference 2025 Conference Paper

A New Variable-Gain Sliding Mode Filter and Its Application to Velocity Filtering

  • Myo Thant Sin Aung
  • Ryo Kikuuwe
  • Soe Lin Paing
  • Jun Yang
  • Haoyong Yu

This paper proposes a new variable gain sliding mode filter augmented by variable windowing for achieving smooth and reactive response over a broad range of input frequencies. The proposed filter can be seen as a synergistic combination of Kikuuwe et al. 's [1] sliding mode filter with varying gain and sliding surfaces and a novel varying-length moving-window algorithm. In all schemes, the estimated input speed is employed for rendering the filter parameters between low and high settings. The discrete-time algorithm of the proposed filter does not suffer from chattering due to implicit (backward) Euler method. The effectiveness of the proposed filter in achieving better trade-off between noise attenuation and signal preservation is validated in both simulation and experimental scenarios by using the velocity signal obtained by differentiation of quantized position data.

IROS Conference 2025 Conference Paper

DA-MPPI: Disturbance-Aware Model Predictive Path Integral via active disturbance estimation and compensation

  • Haodi Zhang
  • Jinya Su
  • Jun Yang
  • Shihua Li

Model Predictive Path Integral (MPPI) controllers are drawing increasing attention for their ability to efficiently handle complex systems by leveraging GPU acceleration while with flexible prediction models and cost functions. However, their performance generally degrades with low-quality prediction models and unknown external disturbances. Existing methods that rely solely on feedforward disturbance compensation are limited by the assumption of matched disturbances, which rarely holds in practice due to the complex lumped disturbances. To this end, we propose a novel Disturbance-Aware (DA-) MPPI framework, which seamlessly integrates an Extended high-order Sliding Mode Observer (ESMO) into MPPI. The ESMO provides accurate estimates of uncertainties and external disturbances, which are directly incorporated into the MPPI rolling dynamics to improve prediction and therefore tracking control performance. The proposed algorithm is verified against the baseline MPPI in AirSim simulation environment by stochastic simulation. Comparatively statistical experiments show that incorporating ESMO within the MPPI framework significantly enhances tracking performance, with the RMSE reduction in term of mean by 8. 0%, 17. 7%, 6. 17%, 12. 9% and in term of standard variance by 11. 5%, 26. 0%, 10. 4%, and 9. 2% in four representative scenarios. The effects of target velocity and prediction horizon on control performance are also systematically evaluated. These results validate the robustness and accuracy of the DA-MPPI controller in complex and uncertain environments. 1

JAIR Journal 2025 Journal Article

DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning

  • Xiaoteng Ma
  • Junyao Chen
  • Li Xia
  • Jun Yang
  • Qianchuan Zhao
  • Zhengyuan Zhou

We present Distributional Soft Actor-Critic (DSAC), a distributional reinforcement learning (RL) algorithm that combines the strengths of distributional information of accumulated rewards and entropy-driven exploration from Soft Actor-Critic (SAC) algorithm. DSAC models the randomness in both action and rewards, surpassing baseline performances on various continuous control tasks. Unlike standard approaches that solely maximize expected rewards, we propose a unified framework for risk-sensitive learning, one that optimizes the risk-related objective while balancing entropy to encourage exploration. Extensive experiments demonstrate DSAC’s effectiveness in enhancing agent performances for both risk-neutral and risk-sensitive control tasks.

IJCAI Conference 2025 Conference Paper

External Memory Matters: Generalizable Object-Action Memory for Retrieval-Augmented Long-Term Video Understanding

  • Jisheng Dang
  • Huicheng Zheng
  • Xudong Wu
  • Jingmei Jiao
  • Bimei Wang
  • Jun Yang
  • Bin Hu
  • Jianhuang Lai

Long video understanding with Large Language Models (LLMs) enables the description of objects that are not explicitly present in the training data. However, continuous changes in known objects and the emergence of new ones require up-to-date knowledge of objects and their dynamics for effective understanding of the open world. To alleviate this, we propose an efficient Retrieval-Enhanced Video Understanding method, dubbed REVU, which leverages external knowledge to enhance the performance of open-world learning. First, REVU introduces an extensible external text-object memory with minimal text-visual mapping, involving static and dynamic multimodal information to help LLMs-based models align text and vision features. Second, REVU retrieves object information from external databases and dynamically integrates frame-specific data from videos, enabling effective knowledge aggregation to comprehend the open world. We conducted experiments on multiple benchmark datasets, and our model demonstrates strong adaptability to out-of-domain data without requiring additional fine-tuning or re-training. Experiments on benchmark video understanding datasets reveal that our model achieves state-of-the-art performance and robust generalization.

JBHI Journal 2025 Journal Article

Miniformer: A Minimalist Transformer for Brain Functional Networks Analysis

  • Yaru Li
  • Jun Yang
  • Mengxue Pang
  • Shuai Zhang
  • Chris Nugent
  • Mingxia Liu
  • Lishan Qiao

Learning to estimate and classify brain functional networks (BFNs) has become an increasingly important way of predicting neurological or mental disorders at their early stages. The traditional methods conduct BFN estimation and classification in two separate steps, thus preventing the interaction and joint optimization. In contrast, Transformer provides a natural architecture to learn BFNs with downstream tasks in an end-to-end manner. Despite their great potential, Transformer-based methods involve a large number of parameters that need to be learnt from Big Data and often lead to poor model interpretability. Considering the challenge in acquiring data and the high demand for model interpretability in medical scenarios, in this paper, we propose a minimalist Transformer architecture, referred to as Miniformer, by simplifying the projection matrices in the self-attention module into a single diagonal matrix, which greatly reduces the number of parameters, alleviates the risk of overfitting, and improves the interpretability. Additionally, the clear physical meaning of parameters in Miniformer makes the integration of domain knowledge or prior easier and more natural. Therefore, we further develop two variants of Miniformer by incorporating sparsity for removing potentially noisy time points from fMRI signals, and smoothness for capturing the temporal correlations in fMRI signals, respectively. To evaluate the effectiveness of the proposed methods, we perform brain disease diagnosis experiments on three public datasets. The results show that Miniformer and its variants tend to achieve higher classification performance than comparison methods with good interpretability.

NeurIPS Conference 2025 Conference Paper

WEDGE: Synthesizing Performance Constraints for Evaluating and Improving Code Efficiency

  • Jun Yang
  • Cheng-Chi Wang
  • Bogdan "Bo" Stoica
  • Kexin Pei

Large Language Models (LLMs) have been increasingly used to optimize code efficiency. Evaluating their effectiveness and further suggesting optimization opportunities often rely on high-quality tests to demonstrate the performance bottlenecks presented in the program. However, existing approaches rely on a limited set of hand-curated inputs or LLM-generated uninteresting length-stressing tests, failing to reveal more nuanced optimization opportunities. We present WEDGE, a framework for generating performance-stressing input given the program under test. WEDGE synthesizes explicit performance-characterizing constraints in the form of branch conditions to partition the programs’ execution space into performance-specific regions. When integrated with the coverage-guided fuzzer, reaching different regions introduces explicit rewards for test generation to explore inefficient implementations. Our evaluation shows that WEDGE introduces a significant slowdown compared to the tests in CodeContests and those claimed to be optimized by existing approaches. From the utility perspective, integrating our tests substantially improves the existing code optimization approaches that rely on test-driven execution feedback. We release PERFFORGE, the performance tests generated by WEDGE, to benchmark future approaches for efficient code generation at https: //github. com/UChiSeclab/perfforge.

AAAI Conference 2024 Conference Paper

Learning Diverse Risk Preferences in Population-Based Self-Play

  • Yuhua Jiang
  • Qihan Liu
  • Xiaoteng Ma
  • Chenghao Li
  • Yiqin Yang
  • Jun Yang
  • Bin Liang
  • Qianchuan Zhao

Among the remarkable successes of Reinforcement Learning (RL), self-play algorithms have played a crucial role in solving competitive games. However, current self-play RL methods commonly optimize the agent to maximize the expected win-rates against its current or historical copies, resulting in a limited strategy style and a tendency to get stuck in local optima. To address this limitation, it is important to improve the diversity of policies, allowing the agent to break stalemates and enhance its robustness when facing with different opponents. In this paper, we present a novel perspective to promote diversity by considering that agents could have diverse risk preferences in the face of uncertainty. To achieve this, we introduce a novel reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case and best-case policy learning, enabling policy learning with desired risk preferences. Furthermore, by seamlessly integrating RPPO with population-based self-play, agents in the population optimize dynamic risk-sensitive objectives using experiences gained from playing against diverse opponents. Our empirical results demonstrate that our method achieves comparable or superior performance in competitive games and, importantly, leads to the emergence of diverse behavioral modes. Code is available at https://github.com/Jackory/RPBT.

NeurIPS Conference 2024 Conference Paper

NeuralPlane: An Efficiently Parallelizable Platform for Fixed-wing Aircraft Control with Reinforcement Learning

  • Chuanyi Xue
  • Qihan Liu
  • Xiaoteng Ma
  • Yang Qi
  • Xinyao Qin
  • Yuhua Jiang
  • Ning Gui
  • Jinsheng Ren

Reinforcement learning (RL) demonstrates superior potential over traditional flight control methods for fixed-wing aircraft, particularly under extreme operational conditions. However, the high demand for training samples and the lack of efficient computation in existing simulators hinder its further application. In this paper, we introduce NeuralPlane, the first benchmark platform for large-scale parallel simulations of fixed-wing aircraft. NeuralPlane significantly boosts high-fidelity simulation via GPU-accelerated Flight Dynamics Model (FDM) computation, achieving a single-step simulation time of just 0. 2 seconds at a parallel scale of $10^{6}$, far exceeding current platforms. We also provide clear code templates, comprehensive evaluation/visualization tools and hierarchical frameworks for integrating RL and traditional control methods. We believe that NeuralPlane can accelerate the development of RL-based fixed-wing flight control and serve as a new challenging benchmark for the RL community. Our NeuralPlane is open-source and accessible at https: //github. com/xuecy22/NeuralPlane.

AAAI Conference 2024 Conference Paper

S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention

  • Chiyu Zhang
  • Xiaogang Xu
  • Lei Wang
  • Zaiyan Dai
  • Jun Yang

Transformer's recent integration into style transfer leverages its proficiency in establishing long-range dependencies, albeit at the expense of attenuated local modeling. This paper introduces Strips Window Attention Transformer (S2WAT), a novel hierarchical vision transformer designed for style transfer. S2WAT employs attention computation in diverse window shapes to capture both short- and long-range dependencies. The merged dependencies utilize the "Attn Merge" strategy, which adaptively determines spatial weights based on their relevance to the target. Extensive experiments on representative datasets show the proposed method's effectiveness compared to state-of-the-art (SOTA) transformer-based and other approaches. The code and pre-trained models are available at https://github.com/AlienZhang1996/S2WAT.

NeurIPS Conference 2023 Conference Paper

Conservative Offline Policy Adaptation in Multi-Agent Games

  • Chengjie Wu
  • Pingzhong Tang
  • Jun Yang
  • Yujing Hu
  • Tangjie Lv
  • Changjie Fan
  • Chongjie Zhang

Prior research on policy adaptation in multi-agent games has often relied on online interaction with the target agent in training, which can be expensive and impractical in real-world scenarios. Inspired by recent progress in offline reinforcement learn- ing, this paper studies offline policy adaptation, which aims to utilize the target agent’s behavior data to exploit its weakness or enable effective cooperation. We investigate its distinct challenges of distributional shift and risk-free deviation, and propose a novel learning objective, conservative offline adaptation, that optimizes the worst-case performance against any dataset consistent proxy models. We pro- pose an efficient algorithm called Constrained Self-Play (CSP) that incorporates dataset information into regularized policy learning. We prove that CSP learns a near-optimal risk-free offline adaptation policy upon convergence. Empirical results demonstrate that CSP outperforms non-conservative baselines in various environments, including Maze, predator-prey, MuJoCo, and Google Football.

AAAI Conference 2023 Conference Paper

Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery

  • Yiqin Yang
  • Hao Hu
  • Wenzhe Li
  • Siyuan Li
  • Jun Yang
  • Qianchuan Zhao
  • Chongjie Zhang

Offline reinforcement learning (RL) enables the agent to effectively learn from logged data, which significantly extends the applicability of RL algorithms in real-world scenarios where exploration can be expensive or unsafe. Previous works have shown that extracting primitive skills from the recurring and temporally extended structures in the logged data yields better learning. However, these methods suffer greatly when the primitives have limited representation ability to recover the original policy space, especially in offline settings. In this paper, we give a quantitative characterization of the performance of offline hierarchical learning and highlight the importance of learning lossless primitives. To this end, we propose to use a flow-based structure as the representation for low-level policies. This allows us to represent the behaviors in the dataset faithfully while keeping the expression ability to recover the whole policy space. We show that such lossless primitives can drastically improve the performance of hierarchical policies. The experimental results and extensive ablation studies on the standard D4RL benchmark show that our method has a good representation ability for policies and achieves superior performance in most tasks.

NeurIPS Conference 2022 Conference Paper

Safe Opponent-Exploitation Subgame Refinement

  • Mingyang Liu
  • Chengjie Wu
  • Qihan Liu
  • Yansen Jing
  • Jun Yang
  • Pingzhong Tang
  • Chongjie Zhang

In zero-sum games, an NE strategy tends to be overly conservative confronted with opponents of limited rationality, because it does not actively exploit their weaknesses. From another perspective, best responding to an estimated opponent model is vulnerable to estimation errors and lacks safety guarantees. Inspired by the recent success of real-time search algorithms in developing superhuman AI, we investigate the dilemma of safety and opponent exploitation and present a novel real-time search framework, called Safe Exploitation Search (SES), which continuously interpolates between the two extremes of online strategy refinement. We provide SES with a theoretically upper-bounded exploitability and a lower-bounded evaluation performance. Additionally, SES enables computationally efficient online adaptation to a possibly updating opponent model, while previous safe exploitation methods have to recompute for the whole game. Empirical results show that SES significantly outperforms NE baselines and previous algorithms while keeping exploitability low at the same time.

IJCAI Conference 2021 Conference Paper

Average-Reward Reinforcement Learning with Trust Region Methods

  • Xiaoteng Ma
  • Xiaohang Tang
  • Li Xia
  • Jun Yang
  • Qianchuan Zhao

Most of reinforcement learning algorithms optimize the discounted criterion which is beneficial to accelerate the convergence and reduce the variance of estimates. Although the discounted criterion is appropriate for certain tasks such as financial related problems, many engineering problems treat future rewards equally and prefer a long-run average criterion. In this paper, we study the reinforcement learning problem with the long-run average criterion. Firstly, we develop a unified trust region theory with discounted and average criteria. With the average criterion, a novel performance bound within the trust region is derived with the Perturbation Analysis (PA) theory. Secondly, we propose a practical algorithm named Average Policy Optimization (APO), which improves the value estimation with a novel technique named Average Value Constraint. To the best of our knowledge, our work is the first one to study the trust region approach with the average criterion and it complements the framework of reinforcement learning beyond the discounted criterion. Finally, experiments are conducted in the continuous control environment MuJoCo. In most tasks, APO performs better than the discounted PPO, which demonstrates the effectiveness of our approach.

NeurIPS Conference 2021 Conference Paper

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

  • Yiqin Yang
  • Xiaoteng Ma
  • Chenghao Li
  • Zewu Zheng
  • Qiyuan Zhang
  • Gao Huang
  • Jun Yang
  • Qianchuan Zhao

Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios. However, compared with the single-agent counterpart, offline multi-agent RL introduces more agents with the larger state and action space, which is more challenging but attracts little attention. We demonstrate current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error. In this paper, we propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error by only trusting the state-action pairs given in the dataset for value estimation. Moreover, we extend ICQ to multi-agent tasks by decomposing the joint-policy under the implicit constraint. Experimental results demonstrate that the extrapolation error is successfully controlled within a reasonable range and insensitive to the number of agents. We further show that ICQ achieves the state-of-the-art performance in the challenging multi-agent offline tasks (StarCraft II). Our code is public online at https: //github. com/YiqinYang/ICQ.

NeurIPS Conference 2021 Conference Paper

Celebrating Diversity in Shared Multi-Agent Reinforcement Learning

  • Chenghao Li
  • Tonghan Wang
  • Chengjie Wu
  • Qianchuan Zhao
  • Jun Yang
  • Chongjie Zhang

Recently, deep multi-agent reinforcement learning (MARL) has shown the promise to solve complex cooperative tasks. Its success is partly because of parameter sharing among agents. However, such sharing may lead agents to behave similarly and limit their coordination capacity. In this paper, we aim to introduce diversity in both optimization and representation of shared multi-agent reinforcement learning. Specifically, we propose an information-theoretical regularization to maximize the mutual information between agents' identities and their trajectories, encouraging extensive exploration and diverse individualized behaviors. In representation, we incorporate agent-specific modules in the shared neural network architecture, which are regularized by L1-norm to promote learning sharing among agents while keeping necessary diversity. Empirical results show that our method achieves state-of-the-art performance on Google Research Football and super hard StarCraft II micromanagement tasks.

AAMAS Conference 2021 Conference Paper

Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning

  • Xiaoteng Ma
  • Yiqin Yang
  • Chenghao Li
  • Yiwen Lu
  • Qianchuan Zhao
  • Jun Yang

Value-based methods of multi-agent reinforcement learning (MARL), especially the value decomposition methods, have been demonstrated on a range of challenging cooperative tasks. However, current methods pay little attention to the interaction between agents, which is essential to teamwork in games or real life. This limits the efficiency of value-based MARL algorithms in the two aspects: collaborative exploration and value function estimation. In this paper, we propose a novel cooperative MARL algorithm named as interactive actor-critic (IAC), which models the interaction of agents from the perspectives of policy and value function. On the policy side, a multi-agent joint stochastic policy is introduced by adopting a collaborative exploration module, which is trained by maximizing the entropy-regularized expected return. On the value side, we use the shared attention mechanism to estimate the value function of each agent, which takes the impact of the teammates into consideration. At the implementation level, we extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments. Experimental results indicate that our method outperforms the state-of-the-art approaches and achieves better performance in terms of cooperation.

JBHI Journal 2019 Journal Article

An ARIMA Model With Adaptive Orders for Predicting Blood Glucose Concentrations and Hypoglycemia

  • Jun Yang
  • Lei Li
  • Yimeng Shi
  • Xiaolei Xie

The continuous glucose monitoring system is an effective tool, which enables the users to monitor their blood glucose (BG) levels. Based on the continuous glucose monitoring (CGM) data, we aim at predicting future BG levels so that appropriate actions can be taken in advance to prevent hyperglycemia or hypoglycemia. Due to the time-varying nonstationarity of CGM data, verified by Augmented Dickey–Fuller test and analysis of variance, an autoregressive integrated moving average (ARIMA) model with an adaptive identification algorithm of model orders is proposed in the prediction framework. Such identification algorithm adaptively determines the model orders and simultaneously estimates the corresponding parameters using Akaike Information Criterion and least square estimation. A case study is conducted with the CGM data of diabetics under daily living conditions to analyze the prediction performance of the proposed model together with the early hypoglycemic alarms. Results show that the proposed model outperforms the adaptive univariate model and ARIMA model.

NeurIPS Conference 2019 Conference Paper

Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes

  • Jun Yang
  • Shengyang Sun
  • Daniel Roy

The developments of Rademacher complexity and PAC-Bayesian theory have been largely independent. One exception is the PAC-Bayes theorem of Kakade, Sridharan, and Tewari (2008), which is established via Rademacher complexity theory by viewing Gibbs classifiers as linear operators. The goal of this paper is to extend this bridge between Rademacher complexity and state-of-the-art PAC-Bayesian theory. We first demonstrate that one can match the fast rate of Catoni's PAC-Bayes bounds (Catoni, 2007) using shifted Rademacher processes (Wegkamp, 2003; Lecué and Mitchell, 2012; Zhivotovskiy and Hanneke, 2018). We then derive a new fast-rate PAC-Bayes bound in terms of the "flatness" of the empirical risk surface on which the posterior concentrates. Our analysis establishes a new framework for deriving fast-rate PAC-Bayes bounds and yields new insights on PAC-Bayesian theory.

AAAI Conference 2018 Conference Paper

Multi-Entity Aspect-Based Sentiment Analysis With Context, Entity and Aspect Memory

  • Jun Yang
  • Runqi Yang
  • Chongjun Wang
  • Junyuan Xie

Inspired by recent works in Aspect-Based Sentiment Analysis(ABSA) on product reviews and faced with more complex posts on social media platforms mentioning multiple entities as well as multiple aspects, we define a novel task called Multi-Entity Aspect-Based Sentiment Analysis (ME-ABSA). This task aims at fine-grained sentiment analysis of (entity, aspect) combinations, making the well-studied ABSA task a special case of it. To address the task, we propose an innovative method that models Context memory, Entity memory and Aspect memory, called CEA method. Our experimental results show that our CEA method achieves a significant gain over several baselines, including the state-of-the-art method for the ABSA task, and their enhanced versions, on datasets for ME-ABSA and ABSA tasks. The in-depth analysis illustrates the significant advantage of the CEA method over baseline methods for several hard-to-predict post types. Furthermore, we show that the CEA method is capable of generalizing to new (entity, aspect) combinations with little loss of accuracy. This observation indicates that data annotation in real applications can be largely simplified.

IROS Conference 2010 Conference Paper

A navigation system for family indoor monitor mobile robot

  • Fusheng Tan
  • Jun Yang
  • Jianming Huang
  • Tinggang Jia
  • Weidong Chen 0001
  • Jingchuan Wang

The navigation system of family indoor mobile robot includes localization, path planning, collision avoidance. The hybrid localization method of straight line matching, corner matching and odometry is proposed. The hardware and software configuration is introduced. Robot detects environment using a 2D laser range finder. Line feature extraction process including area divided, iterative end point fit (IEPF) and a least square technique is introduced. Based on line feature, straight lines and corners as geometry features are obtained. The odometry localization algorithm, straight line localization algorithm and corner localization algorithm are discussed. Artificial Potential Field (APF) based path planning algorithms is implemented. As a result stable localization is achieved with position and orientation resolution as 50mm, 5 degree. A good performance for the method is also achieved with cycle time as 120ms. Experiment shows the effectiveness of the hybrid localization method.