Author name cluster

Junge Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

2 author rows

AAAI Conference 2026 Conference Paper

Persistent Backdoor Attacks Under Continual Fine-Tuning of LLMs

Jing Cui
Yufei Han
Jianbin Jiao
Junge Zhang

Backdoor attacks embed malicious behaviors into Large Language Models (LLMs), enabling adversaries to trigger harmful outputs or bypass safety controls. However, the persistence of the implanted backdoors under user-driven post-deployment continual fine-tuning has been rarely examined. Most prior works evaluate the effectiveness and generalization of implanted backdoors only at releasing and empirical evidence shows that naively injected backdoor persistence degrades after updates. In this work, we study whether and how implanted backdoors persist through a multi‑stage post-deployment fine‑tuning. We propose P‑Trojan, a trigger‑based attack algorithm that explicitly optimizes for backdoor persistence across repeated updates. By aligning poisoned gradients with those of clean tasks on token embeddings, the implanted backdoor mapping is less likely to be suppressed or forgotten during subsequent updates. Theoretical analysis shows the feasibility of such persistent backdoor attacks after continual fine-tuning. And experiments conducted on the Qwen2.5 and LLaMA3 families of LLMs, as well as diverse task sequences, demonstrate that P‑Trojan achieves over \textbf{99\%} persistence while preserving clean‑task accuracy. Our findings highlight the need for persistence-aware evaluation and stronger defenses in realistic model adaptation pipelines.

PDF Details DOI

AAMAS Conference 2025 Conference Paper

RainbowArena: A Multi-Agent Toolkit for Reinforcement Learning and Large Language Models in Competitive Tabletop Games

Yingzhuo Liu
Shuodi Liu
Hongsong Tang
Yubing Ma
Zikang Li
Junge Zhang
Liuyu Xiang
Zhaofeng He

Tabletop games have gained little to no attention, despite offering a range of unique challenges compared to card or board games. We introduce RainbowArena, an open-source toolkit for reinforcement learning and large language models in competitive tabletop games. The goal of RainbowArena is to provide a unified, scalable platform that supports both Reinforcement Learning (RL) and Large Language Models (LLM), and push forward the research in tabletop games. RainbowArena consists of three modules: game, agent and evaluation. We design unified components and interfaces for various tabletop games. To better integrate with game environments, we devise an efficient self-play framework for RL agents, and a standardized prompt structure for LLM agents. Additionally, agents of all types can be evaluated within the evaluation framework. Finally, we evaluate various types of agents across different games and analyze the runtime efficiency for each game.

PDF

AAAI Conference 2025 Conference Paper

Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling

Xingzhou Lou
Junge Zhang
Jian Xie
Lifeng Liu
Dong Yan
Kaiqi Huang

Human preference alignment is critical in building powerful and reliable large language models (LLMs). However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and harmlessness) or struggle with the complexity of managing multiple reward models. To address these issues, we propose Sequential Preference Optimization (SPO), a method that sequentially fine-tunes LLMs to align with multiple dimensions of human preferences. SPO avoids explicit reward modeling, directly optimizing the models to align with nuanced human preferences. We theoretically derive closed-form optimal SPO policy and loss function. Gradient analysis is conducted to show how SPO manages to fine-tune the LLMs while maintaining alignment on previously optimized dimensions. Empirical results on LLMs of different size and multiple evaluation datasets demonstrate that SPO successfully aligns LLMs across multiple dimensions of human preferences and significantly outperforms the baselines.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

ADMN: Agent-Driven Modular Network for Dynamic Parameter Sharing in Cooperative Multi-Agent Reinforcement Learning

Yang Yu
Qiyue Yin
Junge Zhang
Pei Xu
Kaiqi Huang

Parameter sharing is a common strategy in multi-agent reinforcement learning (MARL) to make the training more efficient and scalable. However, applying parameter sharing among agents indiscriminately hinders the emergence of agents diversity and degrades the final cooperative performance. To better balance parameter sharing and agents diversity, we propose a novel Agent-Driven Modular Network (ADMN), where agents share a base network consisting of multiple specialized modules, and each agent has its own routing to connect these modules. In ADMN, modules are shared among agents to improve the training efficiency, while the combination of different modules brings rich diversity. The agent routing at different time steps is learned end-to-end to achieve a dynamic and adaptive balance. Specifically, we also propose an information-theoretical regularization between the routing of agents and their behavior to further guarantee the identifiability of different routing. We evaluated ADMN in challenging StarCraft micromanagement games and Google Research Football games, and results demonstrate the superior performance of ADMN, particularly in larger or heterogeneous cooperative tasks.

PDF Details DOI

AAAI Conference 2024 Conference Paper

BadRL: Sparse Targeted Backdoor Attack against Reinforcement Learning

Jing Cui
Yufei Han
Yuzhe Ma
Jianbin Jiao
Junge Zhang

Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection. In contrast to the previous methods that utilize sample-agnostic trigger patterns, BadRL dynamically generates distinct trigger patterns based on targeted state observations, thereby enhancing its effectiveness. Theoretical analysis shows that the targeted backdoor attack is always viable and remains stealthy under specific assumptions. Empirical results on various classic RL tasks illustrate that BadRL can substantially degrade the performance of a victim agent with minimal poisoning efforts (0.003% of total training steps) during training and infrequent attacks during testing. Code is available at: https://github.com/7777777cc/code.

PDF Details DOI

EAAI Journal 2024 Journal Article

Cross-modal misalignment-robust feature fusion for crowd counting

Weihang Kong
Zepeng Yu
He Li
Junge Zhang

Details DOI

AAAI Conference 2024 Conference Paper

NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields

Junge Zhang
Feihu Zhang
Shaochen Kuang
Li Zhang

Labelling LiDAR point clouds for training autonomous driving is extremely expensive and difficult. LiDAR simulation aims at generating realistic LiDAR data with labels for training and verifying self-driving algorithms more efficiently. Recently, Neural Radiance Fields (NeRF) have been proposed for novel view synthesis using implicit reconstruction of 3D scenes. Inspired by this, we present NeRF-LIDAR, a novel LiDAR simulation method that leverages real-world information to generate realistic LIDAR point clouds. Different from existing LiDAR simulators, we use real images and point cloud data collected by self-driving cars to learn the 3D scene representation, point cloud generation and label rendering. We verify the effectiveness of our NeRF-LiDAR by training different 3D segmentation models on the generated LiDAR point clouds. It reveals that the trained models are able to achieve similar accuracy when compared with the same model trained on the real LiDAR data. Besides, the generated data is capable of boosting the accuracy through pre-training which helps reduce the requirements of the real labeled data. Code is available at https://github.com/fudan-zvg/NeRF-LiDAR

PDF Details DOI

AAMAS Conference 2024 Conference Paper

PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning

Hangyu Mao
Rui Zhao
Ziyue Li
Zhiwei Xu
Hao Chen
Yiqun Chen
Bin Zhang
Zhen Xiao

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on the environmental perception by processing the observation at the patch level, whereas the deciding one pays attention to the decisionmaking by conditioning on the history of the desired returns, the perceiver’s outputs, and the actions. Such a network design is generally applicable to a lot of deep RL settings, e. g. , both the online and offline RL algorithms under environments with either image observations, proprioception observations, or hybrid image-language observations. Extensive experiments show that PDiT can not only achieve superior performance than strong baselines in different settings but also extract explainable feature representations. Our code is available at https: //github. com/maohangyu/PDiT.

PDF

IJCAI Conference 2024 Conference Paper

Population-Based Diverse Exploration for Sparse-Reward Multi-Agent Tasks

Pei Xu
Junge Zhang
Kaiqi Huang

Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. Although population-based learning shows its potential in producing diverse behaviors, most previous works still focus on improving the exploration of a single joint policy. In this paper, we show that with a suitable exploration method, maintaining a population of joint policies rather than one joint policy can significantly improve exploration. Our key idea is to guide each member of the population to explore different regions of the environment. To this end, we propose a member-aware exploration objective which explicitly guides each member to maximize deviation from the explored regions of other members, thus forcing them to explore different regions. In addition, we further propose an exploration-enhanced policy constraint to guide each member to learn a joint policy that is both different from other members and promotes exploration, thus increasing the probability of exploring different regions. Under reward-free setting, our method achieves 72% average improvement in the number of explored states compared to classical exploration methods in the multiple-particle environment. Moreover, under sparse-reward setting, we show that the proposed method significantly outperforms the state-of-the-art methods in the multiple-particle environment, the Google Research Football, and StarCraft II micromanagement tasks.

PDF Details DOI

ICML Conference 2024 Conference Paper

Position: Foundation Agents as the Paradigm Shift for Decision Making

Xiaoqian Liu
Xingzhou Lou
Jianbin Jiao
Junge Zhang

Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision have showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation agents as a transformative shift in the learning paradigm of agents. This proposal is underpinned by the formulation of foundation agents with their fundamental characteristics and challenges motivated by the success of large language models (LLMs). Moreover, we specify the roadmap of foundation agents from large interactive data collection or generation, to self-supervised pretraining and adaptation, and knowledge and value alignment with LLMs. Lastly, we pinpoint critical research questions derived from the formulation and delineate trends for foundation agents supported by real-world use cases, addressing both technical and theoretical aspects to propel the field towards a more comprehensive and impactful future.

Details

AAAI Conference 2024 Conference Paper

ProAgent: Building Proactive Cooperative Agents with Large Language Models

Ceyao Zhang
Kaijie Yang
Siyi Hu
Zihao Wang
Guanghe Li
Yihang Sun
Cheng Zhang
Zhaowei Zhang

Building agents with adaptive behavior in cooperative tasks stands as a paramount goal in the realm of multi-agent systems. Current approaches to developing cooperative agents rely primarily on learning-based methods, whose policy generalization depends heavily on the diversity of teammates they interact with during the training phase. Such reliance, however, constrains the agents' capacity for strategic adaptation when cooperating with unfamiliar teammates, which becomes a significant challenge in zero-shot coordination scenarios. To address this challenge, we propose ProAgent, a novel framework that harnesses large language models (LLMs) to create proactive agents capable of dynamically adapting their behavior to enhance cooperation with teammates. ProAgent can analyze the present state, and infer the intentions of teammates from observations. It then updates its beliefs in alignment with the teammates' subsequent actual behaviors. Moreover, ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various of coordination scenarios. Experimental evaluations conducted within the Overcooked-AI environment unveil the remarkable performance superiority of ProAgent, outperforming five methods based on self-play and population-based training when cooperating with AI agents. Furthermore, in partnered with human proxy models, its performance exhibits an average improvement exceeding 10% compared to the current state-of-the-art method. For more information about our project, please visit https://pku-proagent.github.io.

PDF Details DOI

AAMAS Conference 2024 Conference Paper

Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models

Xingzhou Lou
Junge Zhang
Ziyan Wang
Kaiqi Huang
Yali Du

Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints. Employing constraints expressed via easily-understandable human language offers considerable potential for real-world applications due to its accessibility and non-reliance on domain expertise. Previous safe RL methods with natural language constraints typically adopt a recurrent neural network, which leads to limited capabilities when dealing with various forms of human language input. Furthermore, these methods often require a ground-truth cost function, necessitating domain expertise for the conversion of language constraints into a well-defined cost function that determines constraint violation. To address these issues, we proposes to use pre-trained language models (LM) to facilitate RL agents’ comprehension of natural language constraints and allow them to infer costs for safe policy learning. Through the use of pre-trained LMs and the elimination of the need for a ground-truth cost, our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints. Experiments on grid-world navigation and robot control show that the proposed method can achieve strong performance while adhering to given constraints. The usage of pre-trained LMs allows our method to comprehend complicated constraints and learn safe policies without the need for ground-truth cost at any stage of training or evaluation. Extensive ablation studies are conducted to demonstrate the efficacy of each part of our method.

PDF

AAAI Conference 2024 Conference Paper

TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient

Xingzhou Lou
Junge Zhang
Timothy J. Norman
Kaiqi Huang
Yali Du

Multi-Agent Policy Gradient (MAPG) has made significant progress in recent years. However, centralized critics in state-of-the-art MAPG methods still face the centralized-decentralized mismatch (CDM) issue, which means sub-optimal actions by some agents will affect other agent's policy learning. While using individual critics for policy updates can avoid this issue, they severely limit cooperation among agents. To address this issue, we propose an agent topology framework, which decides whether other agents should be considered in policy gradient and achieves compromise between facilitating cooperation and alleviating the CDM issue. The agent topology allows agents to use coalition utility as learning objective instead of global utility by centralized critics or local utility by individual critics. To constitute the agent topology, various models are studied. We propose Topology-based multi-Agent Policy gradiEnt (TAPE) for both stochastic and deterministic MAPG methods. We prove the policy improvement theorem for stochastic TAPE and give a theoretical explanation for the improved cooperation among agents. Experiment results on several benchmarks show the agent topology is able to facilitate agent cooperation and alleviate CDM issue respectively to improve performance of TAPE. Finally, multiple ablation studies and a heuristic graph search algorithm are devised to show the efficacy of the agent topology.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks

Pei Xu
Junge Zhang
Kaiqi Huang

Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. Previous works argue that complex dynamics between agents and the huge exploration space in MARL scenarios amplify the vulnerability of classical count-based exploration methods when combined with agents parameterized by neural networks, resulting in inefficient exploration. In this paper, we show that introducing constrained joint policy diversity into a classical count-based method can significantly improve exploration when agents are parameterized by neural networks. Specifically, we propose a joint policy diversity to measure the difference between current joint policy and previous joint policies, and then use a filtering-based exploration constraint to further refine the joint policy diversity. Under the sparse-reward setting, we show that the proposed method significantly outperforms the state-of-the-art methods in the multiple-particle environment, the Google Research Football, and StarCraft II micromanagement tasks. To the best of our knowledge, on the hard 3s_vs_5z task which needs non-trivial strategies to defeat enemies, our method is the first to learn winning strategies without domain knowledge under the sparse-reward setting.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

Learning Individual Difference Rewards in Multi-Agent Reinforcement Learning

Chen Yang
Guangkai Yang
Junge Zhang

We investigate explicit solutions to multi-agent credit assignment problem. Specifically, we assign each agent individual difference rewards in addition to the team reward as to distinguish the contribution of different agents to the team. We present a novel reward decomposition network to estimate the influence of each agent’s action on the team reward, and distribute difference rewards accordingly. Furthermore, we combine difference rewards with actor-critic framework and propose a new approach called learning individual difference rewards (LIDR). We evaluate LIDR on a set of StarCraft II micromanagement problems. Results show that LIDR significantly outperforms previous state-of-the-art methods.

PDF

AAMAS Conference 2023 Conference Paper

PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination

Xingzhou Lou
Jiaxian Guo
Junge Zhang
Jun Wang
Kaiqi Huang
Yali Du

Zero-shot human-AI coordination holds the promise of collaborating with humans without human data. Prevailing methods try to train the ego agent with a population of partners via self-play. However, these methods suffer from two problems: 1) The diversity of a population with finite partners is limited, thereby limiting the capacity of the trained ego agent to collaborate with a novel human; 2) Current methods only provide a common best response for every partner in the population, which may result in poor zero-shot coordination performance with a novel partner or humans. To address these issues, we first propose the policy ensemble method to increase the diversity of partners in the population, and then develop a context-aware method enabling the ego agent to analyze and identify the partner’s potential policy primitives so that it can take different actions accordingly. In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners. We conduct experiments on the Overcooked environment, and evaluate the zero-shot human-AI coordination performance of our method with both behavior-cloned human proxies and real humans. The results demonstrate that our method significantly increases the diversity of partners and enables ego agents to learn more diverse behaviors than baselines, thus achieving state-of-theart performance in all scenarios. We also open-source a human-AI ∗Work done while visiting King’s College London. †Correspondence. Proc. of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023), A. Ricci, W. Yeoh, N. Agmon, B. An (eds.), May 29 – June 2, 2023, London, United Kingdom. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved. coordination study framework on the Overcooked for the convenience of future studies. Codes and demo videos are available at https: //sites. google. com/view/pecan-overcooked.

PDF

AAMAS Conference 2023 Conference Paper

Prioritized Tasks Mining for Multi-Task Cooperative Multi-Agent Reinforcement Learning

Yang Yu
Qiyue Yin
Junge Zhang
Kaiqi Huang

Multi-task learning improves data efficiency in cooperative multiagent reinforcement learning, since agents can learn multiple related tasks simultaneously and the cooperation knowledge in a task can be utilized by others. However, existing methods mainly learn multiple cooperation tasks uniformly, regardless of their complexity and significance. In this paper, we propose a new framework called Prioritized Tasks Mining (PTM) for multi-task cooperation problems, which helps agents to identify and mine higher priority cooperation tasks, so as to learn more effective coordinated strategies for multiple cooperation tasks. Specially, agents will use the hindsight during training to identify the priority of different tasks, and make an exploration and exploitation in higher priority cooperative tasks to mine more sophisticated coordinated strategies. We evaluate PTM in challenging multi-task StarCraft micromanagement games with different scales, and results demonstrate that our method consistently outperforms all strong baselines.

PDF

ICLR Conference 2023 Conference Paper

S-NeRF: Neural Radiance Fields for Street Views

Ziyang Xie
Junge Zhang
Wenye Li 0002
Feihu Zhang
Li Zhang 0001

Neural Radiance Fields (NeRFs) aim to synthesize novel views of objects and scenes, given the object-centric camera views with large overlaps. However, we conjugate that this paradigm does not fit the nature of the street views that are collected by many self-driving cars from the large-scale unbounded scenes. Also, the onboard cameras perceive scenes without much overlapping. Thus, existing NeRFs often produce blurs, "floaters" and other artifacts on street-view synthesis. In this paper, we propose a new street-view NeRF (S-NeRF) that considers novel view synthesis of both the large-scale background scenes and the foreground moving vehicles jointly. Specifically, we improve the scene parameterization function and the camera poses for learning better neural representations from street views. We also use the the noisy and sparse LiDAR points to boost the training and learn a robust geometry and reprojection based confidence to address the depth outliers. Moreover, we extend our S-NeRF for reconstructing moving vehicles that is impracticable for conventional NeRFs. Thorough experiments on the large-scale driving datasets (e.g., nuScenes and Waymo) demonstrate that our method beats the state-of-the-art rivals by reducing 7～40% of the mean-squared error in the street-view synthesis and a 45% PSNR gain for the moving vehicles rendering.

Details

AAAI Conference 2023 Conference Paper

Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks

Pei Xu
Junge Zhang
Qiyue Yin
Chao Yu
Yaodong Yang
Kaiqi Huang

Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. One possible solution to this issue is to exploit inherent task structures for an acceleration of exploration. In this paper, we present a novel exploration approach, which encodes a special structural prior on the reward function into exploration, for sparse-reward multi-agent tasks. Specifically, a novel entropic exploration objective which encodes the structural prior is proposed to accelerate the discovery of rewards. By maximizing the lower bound of this objective, we then propose an algorithm with moderate computational cost, which can be applied to practical tasks. Under the sparse-reward setting, we show that the proposed algorithm significantly outperforms the state-of-the-art algorithms in the multiple-particle environment, the Google Research Football and StarCraft II micromanagement tasks. To the best of our knowledge, on some hard tasks (such as 27m_vs_30m}) which have relatively larger number of agents and need non-trivial strategies to defeat enemies, our method is the first to learn winning strategies under the sparse-reward setting.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Coordinated Proximal Policy Optimization

Zifan Wu
Chao Yu
Deheng Ye
Junge Zhang
Haiyin Piao
Hankz Hankui Zhuo

We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting. The key idea lies in the coordinated adaptation of step size during the policy update process among multiple agents. We prove the monotonicity of policy improvement when optimizing a theoretically-grounded joint objective, and derive a simplified optimization objective based on a set of approximations. We then interpret that such an objective in CoPPO can achieve dynamic credit assignment among agents, thereby alleviating the high variance issue during the concurrent update of agent policies. Finally, we demonstrate that CoPPO outperforms several strong baselines and is competitive with the latest multi-agent PPO method (i. e. MAPPO) under typical multi-agent settings, including cooperative matrix games and the StarCraft II micromanagement tasks.

PDF Details

AAAI Conference 2021 Conference Paper

Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

Wenzhen Huang
Qiyue Yin
Junge Zhang
Kaiqi Huang

Model-based reinforcement learning (RL) is more sample efficient than model-free RL by using imaginary trajectories generated by the learned dynamics model. When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions. To alleviate such problem, this paper proposes to adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories. More specifically, we evaluate the effect of an imaginary transition by calculating the change of the loss computed on the real samples when we use the transition to train the action-value and policy functions. Based on this evaluation criterion, we construct the idea of reweighting each imaginary transition by a well-designed meta-gradient algorithm. Extensive experimental results demonstrate that our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks. Visualization of our changing weights further validates the necessity of utilizing reweight scheme.

PDF Details

NeurIPS Conference 2021 Conference Paper

SOFT: Softmax-free Transformer with Linear Complexity

Jiachen Lu
Jinghan Yao
Junge Zhang
Xiatian Zhu
Hang Xu
Weiguo Gao
Chunjing Xu
Tao Xiang

Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. However, the employment of self-attention modules results in a quadratic complexity in both computation and memory usage. Various attempts on approximating the self-attention computation with linear complexity have been made in Natural Language Processing. However, an in-depth analysis in this work shows that they are either theoretically flawed or empirically ineffective for visual recognition. We further identify that their limitations are rooted in keeping the softmax self-attention during approximations. Specifically, conventional self-attention is computed by normalizing the scaled dot-product between token feature vectors. Keeping this softmax operation challenges any subsequent linearization efforts. Based on this insight, for the first time, a softmax-free transformer or SOFT is proposed. To remove softmax in self-attention, Gaussian kernel function is used to replace the dot-product similarity without further normalization. This enables a full self-attention matrix to be approximated via a low-rank matrix decomposition. The robustness of the approximation is achieved by calculating its Moore-Penrose inverse using a Newton-Raphson method. Extensive experiments on ImageNet show that our SOFT significantly improves the computational efficiency of existing ViT variants. Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.

PDF Details

AAAI Conference 2019 Conference Paper

Bootstrap Estimated Uncertainty of the Environment Model for Model-Based Reinforcement Learning

Wenzhen Huang
Junge Zhang
Kaiqi Huang

Model-based reinforcement learning (RL) methods attempt to learn a dynamics model to simulate the real environment and utilize the model to make better decisions. However, the learned environment simulator often has more or less model error which would disturb making decision and reduce performance. We propose a bootstrapped model-based RL method which bootstraps the modules in each depth of the planning tree. This method can quantify the uncertainty of environment model on different state-action pairs and lead the agent to explore the pairs with higher uncertainty to reduce the potential model errors. Moreover, we sample target values from their bootstrap distribution to connect the uncertainties at current and subsequent time-steps and introduce the prior mechanism to improve the exploration efficiency. Experiment results demonstrate that our method efficiently decreases model error and outperforms TreeQN and other stateof-the-art methods on multiple Atari games.

PDF Details

AAAI Conference 2019 Short Paper

Transductive Zero-Shot Learning via Visual Center Adaptation

Ziyu Wan
Yan Li
Min Yang
Junge Zhang

In this paper, we propose a Visual Center Adaptation Method (VCAM) to address the domain shift problem in zero-shot learning. For the seen classes in the training data, VCAM builds an embedding space by learning the mapping from semantic space to some visual centers. While for unseen classes in the test data, the construction of embedding space is constrained by a symmetric Chamfer-distance term, aiming to adapt the distribution of the synthetic visual centers to that of the real cluster centers. Therefore the learned embedding space can generalize the unseen classes well. Experiments on two widely used datasets demonstrate that our model significantly outperforms state-of-the-art methods.

PDF Details

NeurIPS Conference 2019 Conference Paper

Transductive Zero-Shot Learning with Visual Structure Constraint

Ziyu Wan
DongDong Chen
Yan Li
Xingguang Yan
Junge Zhang
Yizhou Yu
Jing Liao

To recognize objects of the unseen classes, most existing Zero-Shot Learning (ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, in real scenarios, the data distribution between the source and target domain might not match well, thus causing the well-known domain shift problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (\ie alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance, Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose a new training strategy to handle the real cases where many unrelated images exist in the test dataset, which is not considered in previous methods. Experiments on many widely used datasets demonstrate that the proposed visual structure constraint can bring substantial performance gain consistently and achieve state-of-the-art results.

PDF Details

AAAI Conference 2018 Conference Paper

Deep Semantic Structural Constraints for Zero-Shot Learning

Yan Li
Zhen Jia
Junge Zhang
Kaiqi Huang
Tieniu Tan

Zero-shot learning aims to classify unseen image categories by learning a visual-semantic embedding space. In most cases, the traditional methods adopt a separated two-step pipeline that extracts image features from pre-trained CNN models. Then the ﬁxed image features are utilized to learn the embedding space. It leads to the lack of speciﬁc structural semantic information of image features for zero-shot learning task. In this paper, we propose an end-to-end trainable Deep Semantic Structural Constraints model to address this issue. The proposed model contains the Image Feature Structure constraint and the Semantic Embedding Structure constraint, which aim to learn structure-preserving image features and endue the learned embedding space with stronger generalization ability respectively. With the assistance of semantic structural information, the model gains more auxiliary clues for zero-shot learning. The state-of-the-art performance certiﬁes the effectiveness of our proposed method.

PDF Details

AAAI Conference 2018 Conference Paper

DF 2 Net: Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification

Yabei Li
Junge Zhang
Yanhua Cheng
Kaiqi Huang
Tieniu Tan

This paper focuses on the task of RGB-D indoor scene classiﬁcation. It is a very challenging task due to two folds. 1) Learning robust representation for indoor scene is difﬁcult because of various objects and layouts. 2) Fusing the complementary cues in RGB and Depth is nontrivial since there are large semantic gaps between the two modalities. Most existing works learn representation for classiﬁcation by training a deep network with softmax loss and fuse the two modalities by simply concatenating the features of them. However, these pipelines do not explicitly consider intra-class and interclass similarity as well as inter-modal intrinsic relationships. To address these problems, this paper proposes a Discriminative Feature Learning and Fusion Network (DF2 Net) with two-stage training. In the ﬁrst stage, to better represent scene in each modality, a deep multi-task network is constructed to simultaneously minimize the structured loss and the softmax loss. In the second stage, we design a novel discriminative fusion network which is able to learn correlative features of multiple modalities and distinctive features of each modality. Extensive analysis and experiments on SUN RGB-D Dataset and NYU Depth Dataset V2 show the superiority of DF2 Net over other state-of-the-art methods in RGB-D indoor scene classiﬁcation task.

PDF Details

IJCAI Conference 2016 Conference Paper

FastLCD: Fast Label Coordinate Descent for the Efficient Optimization of 2D Label MRFs

Kangwei Liu
Junge Zhang
Peipei Yang
Kaiqi Huang

Recently, MRFs with two-dimensional (2D) labels have proved useful to many applications, such as image matching and optical flow estimation. Due to the huge 2D label set in these problems, existing optimization algorithms tend to be slow for the inference of 2D label MRFs, and this greatly limits the practical use of 2D label MRFs. To solve the problem, this paper presents an efficient algorithm, named FastLCD. Unlike previous popular move-making algorithms (e. g. , α -expansion) that visit all the labels exhaustively in each step, FastLCD optimizes the 2D label MRFs by performing label coordinate descents alternately in horizontal, vertical and diagonal directions, and by this way, it does not need to visit all the labels exhaustively. FastLCD greatly reduces the search space of the label set and benefits from a lower time complexity. Experimental results show that FastLCD is much faster, while it still yields high quality results.

PDF Details