Author name cluster

Haipeng Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

2 author rows

AAAI Conference 2026 Conference Paper

Attentive Keypoint Identification: Progressive Spatiotemporal Refinement for Video-based Human Pose Estimation

Sifan Wu
Haipeng Chen
Yingda Lyu
Shaojing Fan
Zhigang Wang
Zhenguang Liu
Yingying Jiao

Video-based human pose estimation has vast applications such as action recognition, sports analytics, and crime detection. However, this task is challenging as it involves interpreting both spatial context and temporal dynamics to accurately localize human anatomical keypoints in video sequences. Current approaches, often based on attention mechanisms, perform well but struggle in challenging scenarios like rapid motion and pose occlusion. We attribute these failures to two fundamental limitations: spatial uniformity, where models indiscriminately assign attention to both joint-relevant features and background clutter, thereby introducing spatial noise; and temporal rigidity, an inability to adapt to large joint displacements, resulting in severe feature misalignment during rapid motion. To overcome these challenges, we introduce PSTPose, a novel progressive spatiotemporal refinement framework. Specifically, to address the spatial uniformity problem, we propose a Discriminative Feature Enhancement (DFE) module that emphasizes joint-relevant features and a Feature Cluster Grouping (FCG) module that forms compact, semantically meaningful regions. For the temporal rigidity problem, we introduce a Deformable Spatiotemporal Fusion (DSF) module that adaptively aligns features across consecutive frames via deformation-aware sampling. This design ensures robust keypoint localization, particularly in cluttered and dynamic scenes. Extensive experiments on three large-scale benchmarks, PoseTrack2017, PoseTrack2018, PoseTrack21, demonstrate that PSTPose establishes a new state-of-the-art.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Causality-Aligned Semantic Recovery for Incomplete Cross-Modal Retrieval

Haipeng Chen
Yu Liu
Xun Yang
Yuheng Liang
Yingda Lyu

Incomplete cross-modal retrieval (ICMR) requires models to recover missing modalities and robustly align heterogeneous ones for effective retrieval. Existing methods, however, fall short in both aspects. They often rely on limited semantic cues, such as single samples or coarse category prototypes, which compromises reconstruction quality. Moreover, these approaches are vulnerable to learning spurious cross-modal correlations, thereby impairing accurate alignment and hindering retrieval performance. To address these challenges, we propose Causality-Aligned Semantic Recovery (CASR), a novel method designed to both comprehensively restore missing modalities and mitigate spurious associations between vision and language. Our CASR involves two essential components: i) the Missing Modality Imagination (MMI) module, which combines category semantic priors with relevant contextual information to achieve high-quality semantic reconstruction; ii) the Explicit Causal Alignment (ECA) module, which explicitly learns environment-invariant attention, effectively eliminating the interference of spurious correlations and improving retrieval performance. Furthermore, we extend CASR to the challenging task of Partially Aligned Cross-Modal Retrieval, where we treat unlabeled unpaired data as a form of incomplete data. By leveraging MMI and ECA modules, we are able to learn robust representations in this setting. Extensive experiments on benchmark datasets under various missing rates demonstrate that CASR achieves superior robustness and retrieval performance.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Dual Coding Theory in Action: Language-Assisted Human Pose Estimation in Videos

Sifan Wu
Haipeng Chen
Yingda Lyu
Shaojing Fan
Zhigang Wang
Zhenguang Liu
Yingying Jiao

Video-based human pose estimation aims to localize keypoints across frames, enabling robust analysis of human motion in applications such as sports, surveillance, and healthcare. However, existing methods rely solely on visual cues, limiting their robustness in complex scenes involving occlusion, motion blur, or poor lighting. In contrast, dual coding theory from psychology suggests that human cognition is inherently multimodal: we learn by integrating visual perception with linguistic context to form structured, semantic understandings of the world. Visual input provides concrete spatiotemporal grounding, while language offers symbolic abstraction that enhances reasoning and generalization. Motivated by this cognitive principle, we present the first framework that explicitly incorporates language as an auxiliary modality to enhance video-based pose estimation. To address the lack of paired video-text datasets, we first employ a Multimodal Large Language Model (MLLM) to generate textual descriptions of human interactions from videos. We then propose a novel coarse-to-fine multimodal alignment pipeline: a cross-modal semantic interaction module establishes initial grounding between spatiotemporal visual features and textual embeddings, while an optimal transport-based feature matching mechanism enforces fine-grained, geometry-aware alignment. This cognitively inspired design enables more accurate and robust pose estimation, especially in visually challenging scenes like occlusion and motion blur. Extensive experiments on three benchmarks confirm that our method consistently outperforms state-of-the-art approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Re-SpS: A Reinforcement Learning Approach to Speculative Sampling

Chenan Wang
Daniel H. Shi
Haipeng Chen

Inference time latency has remained an open challenge for real world applications of large language models (LLMs). State-of-the-art (SOTA) speculative sampling (SpS) methods for LLMs, like EAGLE-3, use tree-based drafting to explore multiple candidate continuations in parallel. However, the hyperparameters controlling the tree structure are static, which limits flexibility and efficiency across diverse contexts and domains. We introduce Reinforcement learning for Speculative Sampling (Re-SpS), the first reinforcement learning (RL)-based framework for draft tree hyperparameter optimization. Re-SpS dynamically adjusts draft tree hyperparameters in real-time, learning context-aware policies that maximize generation speed by balancing speculative aggression with computational overhead. It leverages efficient state representations from target model hidden states and introduces multi-step action persistence for better context modeling. Evaluation results across five diverse benchmarks demonstrate consistent improvements over the SOTA method EAGLE-3, achieving up to 5.45x speedup over the backbone LLM and up to 1.12x speedup compared to EAGLE-3 across five diverse benchmarks, with no loss in output fidelity.

PDF Details DOI

AAAI Conference 2026 Conference Paper

VGD: Value-Guided Diffusion Toward High-Utility Medical Image Segmentation

Hongyu Zhang
Haipeng Chen
Chengxin Yang
Yingda Lyu

Progress in medical image segmentation is fundamentally constrained by the scarcity of annotated data. While diffusion models offer a promising solution by generating high-fidelity image–mask pairs, their utility for downstream tasks remains underexplored. A key bottleneck lies in the misalignment between generation outputs and task-specific needs—samples are produced independently of their utility for downstream training. To this end, we propose Value-Guided Diffusion (VGD), a lightweight sampling framework that integrates downstream model feedback into the generative inference process. VGD estimates a value score for each sample based on its utility to downstream training, and leverages this signal to iteratively guide the denoising trajectory toward high-reward regions of the data manifold. Crucially, VGD can be seamlessly integrated into existing medical diffusion models without any additional training or architectural modifications. Extensive experiments across multiple diffusion backbones and segmentation benchmarks demonstrate that VGD significantly boosts downstream segmentation performance while maintaining visual fidelity. Our findings highlight a task-aware sampling principle with potential to underpin future synthetic segmentation pipelines.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Can Reinforcement Learning Solve Asymmetric Combinatorial-Continuous Zero-Sum Games?

Yuheng Li
Panpan Wang
Haipeng Chen

There have been extensive studies on learning in zero-sum games, focusing on the analysis of the existence and algorithmic convergence of Nash equilibrium (NE). Existing studies mainly focus on symmetric games where the strategy spaces of the players are of the same type and size. For the few studies that do consider asymmetric games, they are mostly restricted to matrix games. In this paper, we define and study a new practical class of asymmetric games called two-player Asymmetric Combinatorial-Continuous zEro-Sum (ACCES) games, featuring a combinatorial action space for one player and an infinite compact space for the other. Such ACCES games have broad implications in the real world, particularly in combinatorial optimization problems (COPs) where one player optimizes a solution in a combinatorial space, and the opponent plays against it in an infinite (continuous) compact space (e.g., a nature player deciding epistemic parameters of the environmental model). Our first key contribution is to prove the existence of NE for two-player ACCES games, using the idea of essentially finite game approximation. Building on the theoretical insights and double oracle (DO)-based solutions to complex zero-sum games, our second contribution is to design the novel algorithm, Combinatorial Continuous DO (CCDO), to solve ACCES games, and prove the convergence of the proposed algorithm. Considering the NP-hardness of most COPs and recent advancements in reinforcement learning (RL)-based solutions to COPs, our third contribution is to propose a practical algorithm to solve NE in the real world, CCDORL (based on CCDO) and provide the novel convergence analysis in the ACCES game. Experimental results across diverse instances of COPs demonstrate the empirical effectiveness of our algorithms.

AAAI Conference 2025 Conference Paper

Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation

Haipeng Chen
Sifan Wu
Zhigang Wang
Yifang Yin
Yingying Jiao
Yingda Lyu
Zhenguang Liu

Video-based human pose estimation has long been a fundamental yet challenging problem in computer vision. Previous studies focus on spatio-temporal modeling through the enhancement of architecture design and optimization strategies. However, they overlook the causal relationships in the joints, leading to models that may be overly tailored and thus estimate poorly to challenging scenes. Therefore, adequate causal reasoning capability, coupled with good interpretability of model, are both indispensable and prerequisite for achieving reliable results. In this paper, we pioneer a causal perspective on pose estimation and introduce a causal-inspired multitask learning framework, consisting of two stages. In the first stage, we try to endow the model with causal spatio-temporal modeling ability by introducing two self-supervision auxiliary tasks. Specifically, these auxiliary tasks enable the network to infer challenging keypoints based on observed keypoint information, thereby imbuing causal reasoning capabilities into the model and making it robust to challenging scenes. In the second stage, we argue that not all feature tokens contribute equally to pose estimation. Prioritizing causal (keypoint-relevant) tokens is crucial to achieve reliable results, which could improve the interpretability of the model. To this end, we propose a Token Causal Importance Selection module to identify the causal tokens and non-causal tokens (e.g., background and objects). Additionally, non-causal tokens could provide potentially beneficial cues but may be redundant. We further introduce a non-causal tokens clustering module to merge the similar non-causal tokens. Extensive experiments show that our method outperforms state-of-the-art methods on three large-scale benchmark datasets.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Enhancing Semantic Clarity: Discriminative and Fine-grained Information Mining for Remote Sensing Image-Text Retrieval

Yu Liu
Haipeng Chen
Yuheng Liang
Yuheng Yang
Xun Yang
Yingda Lyu

Remote sensing image-text retrieval is a fundamental task in remote sensing multimodal analysis, promoting the alignment of visual and language representations. The mainstream approaches commonly focus on capturing shared semantic representations between visual and textual modalities. However, the inherent characteristics of remote sensing image-text pairs lead to a semantic confusion problem, stemming from redundant visual representations and high inter-class similarity. To tackle this problem, we propose a novel Discriminative and Fine-grained Information Mining (DFIM) model, which aims to enhance semantic clarity by reducing visual redundancy and increasing the semantic gap between different classes. Specifically, the Dynamic Visual Enhancement (DVE) module adaptively enhances the visual discriminative features under the guidance of multimodal fusion information. Meanwhile, the Fine-grained Semantic Matching (FSM) module cleverly models the matching relationship between image regions and text words as an optimal transport problem, thereby refining intra-instance matching. Extensive experiments on two benchmark datasets justify the superiority of DFIM in terms of retrieval accuracy and visual interpretability over the leading methods.

PDF Details DOI

AAAI Conference 2025 Conference Paper

HVIS: A Human-like Vision and Inference System for Human Motion Prediction

Kedi Lyu
Haipeng Chen
Zhenguang Liu
Yifang Yin
Yukang Lin
Yingying Jiao

Grasping the intricacies of human motion, which involve perceiving spatio-temporal dependence and multi-scale effects, is essential for predicting human motion. While humans inherently possess the requisite skills to navigate this issue, it proves to be markedly more challenging for machines to emulate. To bridge the gap, we propose the Human-like Vision and Inference System (HVIS) for human motion prediction, which is designed to emulate human observation and forecast future movements. HVIS comprises two components: the human-like vision encode (HVE) module and the human-like motion inference (HMI) module. The HVE module mimics and refines the human visual process, incorporating a retina-analog component that captures spatiotemporal information separately to avoid unnecessary crosstalk. Additionally, a visual cortex-analogy component is designed to hierarchically extract and treat complex motion features, focusing on both global and local features of human poses. The HMI is employed to simulate the multi-stage learning model of the human brain. The spontaneous learning network simulates the neuronal fracture generation process for the adversarial generation of future motions. Subsequently, the deliberate learning network is optimized for hard-to-train joints to prevent misleading learning. Experimental results demonstrate that our method achieves new state-of-the-art performance, significantly outperforming existing methods by 19.8 % on Human3.6M, 15.7 % on CMU Mocap, and 11.1 % on G3D.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Population Aware Diffusion for Time Series Generation

Yang Li
Han Meng
Zhenyu Bi
Ingolv T. Urnes
Haipeng Chen

Diffusion models have shown promising ability in generating high-quality time series (TS) data. Despite the initial success, existing works mostly focus on the authenticity of data at the individual level, but pay less attention to preserving the population-level properties on the entire dataset. Such population-level properties include value distributions for each dimension and distributions of certain functional dependencies (e.g., cross-correlation, CC) between different dimensions. For instance, when generating house energy consumption TS data, the value distributions of the outside temperature and the kitchen temperature should be preserved, as well as the distribution of CC between them. Preserving such TS population-level properties is critical in maintaining the statistical insights of the datasets, mitigating model bias, and augmenting downstream tasks like TS prediction. Yet, it is often overlooked by existing models. Hence, data generated by existing models often bear distribution shifts from the original data. We propose Population-aware Diffusion for Time Series (PaD-TS), a new TS generation model that better preserves the population-level properties. The key novelties of PaD-TS include 1) a new training method explicitly incorporating TS population-level property preservation, and 2) a new dual-channel encoder model architecture that better captures the TS data structure. Empirical results in major benchmark datasets show that PaD-TS can improve the average CC distribution shift score between real and synthetic data by 5.9x while maintaining a performance comparable to state-of-the-art models on individual-level authenticity.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Skeleton-based Action Recognition with Non-linear Dependency Modeling and Hilbert-Schmidt Independence Criterion

Haipeng Chen
Yuheng Yang
Yingda Lyu

Human skeleton-based action recognition has long been an indispensable aspect of artificial intelligence. Current state-of-the-art methods tend to consider only the dependencies between connected skeletal joints, limiting their ability to capture non-linear dependencies between physically distant joints. Moreover, most existing approaches distinguish action classes by estimating the probability density of motion representations, yet the high-dimensional nature of human motions invokes inherent difficulties in accomplishing such measurements. In this paper, we seek to tackle these challenges from two directions: (1) We propose a novel dependency refinement approach that explicitly models dependencies between any pair of joints, effectively transcending the limitations imposed by joint distance. (2) We further propose a framework that utilizes the Hilbert-Schmidt Independence Criterion to differentiate action classes without being affected by data dimensionality, and mathematically derive learning objectives guaranteeing precise recognition. Empirically, our approach sets the state-of-the-art performance on NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA datasets.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Causality-Inspired Invariant Representation Learning for Text-Based Person Retrieval

Yu Liu
Guihe Qin
Haipeng Chen
Zhiyong Cheng
Xun Yang

Text-based Person Retrieval (TPR) aims to retrieve relevant images of specific pedestrians based on the given textual query. The mainstream approaches primarily leverage pretrained deep neural networks to learn the mapping of visual and textual modalities into a common latent space for cross-modality matching. Despite their remarkable achievements, existing efforts mainly focus on learning the statistical cross-modality correlation found in training data, other than the intrinsic causal correlation. As a result, they often struggle to retrieve accurately in the face of environmental changes such as illumination, pose, and occlusion, or when encountering images with similar attributes. In this regard, we pioneer the observation of TPR from a causal view. Specifically, we assume that each image is composed of a mixture of causal factors (which are semantically consistent with text descriptions) and non-causal factors (retrieval-irrelevant, e.g., background), and only the former can lead to reliable retrieval judgments. Our goal is to extract text-critical robust visual representation (i.e., causal factors) and establish domain invariant cross-modality correlations for accurate and reliable retrieval. However, causal/non-causal factors are unobserved, so we emphasize that ideal causal factors that can simulate causal scenes should satisfy two basic principles:1） Independence: being independent of non-causal factors, and 2）Sufficiency: being causally sufficient for TPR across different environments. Building on that, we propose an Invariant Representation Learning method for TPR (IRLT), that enforces the visual representations to satisfy the two aforementioned critical properties. Extensive experiments on three datasets clearly demonstrate the advantages of IRLT over leading baselines in terms of accuracy and generalization.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

A Learning Approach to Complex Contagion Influence Maximization

Haipeng Chen
Bryan Wilder
Wei Qiu
Bo An
Eric Rice
Milind Tambe

Influence maximization (IM) aims to find a set of seed nodes in a social network that maximizes the influence spread. While most IM problems focus on classical influence cascades (e. g. , Independent Cascade and Linear Threshold) which assume individual influence cascade probability is independent of the number of neighbors, recent studies by sociologists show that many influence cascades follow a pattern called complex contagion (CC), where influence cascade probability is much higher when more neighbors are influenced. Nonetheless, there are very limited studies on complex contagion influence maximization (CCIM) problems. This is partly because CC is non-submodular, the solution of which has been an open challenge. In this study, we propose the first reinforcement learning (RL) approach to CCIM. We find that a key obstacle in applying existing RL approaches to CCIM is the reward sparseness issue, which comes from two distinct sources. We then design a new RL algorithm that uses the CCIM problem structure to address the issue. Empirical results show that our approach achieves the state-of-the-art performance on four real-world networks.

IJCAI Conference 2023 Conference Paper

Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization

Yuheng Yang
Haipeng Chen
Zhenguang Liu
Yingda Lyu
Beibei Zhang
Shuang Wu
Zhibo Wang
Kui Ren

Action recognition has long been a fundamental and intriguing problem in artificial intelligence. The task is challenging due to the high dimensionality nature of an action, as well as the subtle motion details to be considered. Current state-of-the-art approaches typically learn from articulated motion sequences in the straightforward 3D Euclidean space. However, the vanilla Euclidean space is not efficient for modeling important motion characteristics such as the joint-wise angular acceleration, which reveals the driving force behind the motion. Moreover, current methods typically attend to each channel equally and lack theoretical constrains on extracting task-relevant features from the input. In this paper, we seek to tackle these challenges from three aspects: (1) We propose to incorporate an acceleration representation, explicitly modeling the higher-order variations in motion. (2) We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention, where different representations (i. e. , streams) supplement each other towards a more precise action recognition while attention capitalizes on those important channels. (3) We explore feature-level supervision for maximizing the extraction of task-relevant information and formulate this into a mutual information loss. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

AI-driven Prices for Externalities and Sustainability in Production Markets

Panayiotis Danassis
Aris Filos-Ratsikas
Haipeng Chen
Milind Tambe
Boi Faltings

Markets do not account for negative externalities; indirect costs that some participants impose on others, such as the cost of overappropriating a common-pool resource (which diminishes future stock, and thus harvest, for everyone). Quantifying appropriate interventions to market prices has proven to be quite challenging. We propose a practical approach to computing market prices and allocations via a deep reinforcement learning policymaker agent, operating in an environment of other learning agents. Our policymaker allows us to tune the prices with regard to diverse objectives such as sustainability and resource wastefulness, fairness, buyers’ and sellers’ welfare, etc. As a highlight of our findings, our policymaker is significantly more successful in maintaining resource sustainability, compared to the market equilibrium outcome, in scarce resource environments.

IJCAI Conference 2023 Conference Paper

Complex Contagion Influence Maximization: A Reinforcement Learning Approach

Haipeng Chen
Bryan Wilder
Wei Qiu
Bo An
Eric Rice
Milind Tambe

In influence maximization (IM), the goal is to find a set of seed nodes in a social network that maximizes the influence spread. While most IM problems focus on classical influence cascades (e. g. , Independent Cascade and Linear Threshold) which assume individual influence cascade probability is independent of the number of neighbors, recent studies by sociologists show that many influence cascades follow a pattern called complex contagion (CC), where influence cascade probability is much higher when more neighbors are influenced. Nonetheless, there are very limited studies for complex contagion influence maximization (CCIM) problems. This is partly because CC is non-submodular, the solution of which has been an open challenge. In this study, we propose the first reinforcement learning (RL) approach to CCIM. We find that a key obstacle in applying existing RL approaches to CCIM is the reward sparseness issue, which comes from two distinct sources. We then design a new RL algorithm that uses the CCIM problem structure to address the issue. Empirical results show that our approach achieves the state-of-the-art performance on 9 real-world networks.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Discrepancy-Guided Reconstruction Learning for Image Forgery Detection

Zenan Shi
Haipeng Chen
Long Chen
Dong Zhang

In this paper, we propose a novel image forgery detection paradigm for boosting the model learning capacity on both forgery-sensitive and genuine compact visual patterns. Compared to the existing methods that only focus on the discrepant-specific patterns (\eg, noises, textures, and frequencies), our method has a greater generalization. Specifically, we first propose a Discrepancy-Guided Encoder (DisGE) to extract forgery-sensitive visual patterns. DisGE consists of two branches, where the mainstream backbone branch is used to extract general semantic features, and the accessorial discrepant external attention branch is used to extract explicit forgery cues. Besides, a Double-Head Reconstruction (DouHR) module is proposed to enhance genuine compact visual patterns in different granular spaces. Under DouHR, we further introduce a Discrepancy-Aggregation Detector (DisAD) to aggregate these genuine compact visual patterns, such that the forgery detection capability on unknown patterns can be improved. Extensive experimental results on four challenging datasets validate the effectiveness of our proposed method against state-of-the-art competitors.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Sequential Vaccine Allocation with Delayed Feedback

Yichen Xiao
Han-Ching Ou
Haipeng Chen
Van Thieu Nguyen
Long Tran-Thanh

In this work we consider the problem of how to best allocate a limited supply of vaccines in the aftermath of an infectious disease outbreak by viewing the problem as a sequential game between a learner and an environment (specifically, a bandit problem). The difficulty of this problem lies in the fact that the payoff of vaccination cannot be directly observed, making it difficult to compare the relative effectiveness of vaccination on different population groups. Currently used vaccination policies make recommendations based on mathematical modelling and ethical considerations. These policies are static, and do not adapt as conditions change. Our aim is to design and evaluate an algorithm which can make use of routine surveillance data to dynamically adjust its recommendation. We evaluate the performance of our approach by applying it to a simulated epidemic of a disease based on real-world COVID-19 data, and show that our vaccination policy was able to perform better than existing vaccine allocation policies. In particular, we show that with our allocation method, we can reduce the number of required vaccination by at least 50% in order to keep the peak number of hospitalised patients below a certain threshold. Also, when the same batch sizes are used, our method can reduce the peak number of hospitalisation by up to 20%. We also demonstrate that our vaccine allocation does not vary the number of batches per group much, making it socially more acceptable (as it reduces uncertainty, hence results in better and more interpretable communication).

PDF Details DOI

AAMAS Conference 2021 Conference Paper

Active Screening for Recurrent Diseases: A Reinforcement Learning Approach

Han-Ching Ou
Haipeng Chen
Shahin Jabbari
Milind Tambe

Active screening is a common approach in controlling the spread of recurring infectious diseases such as tuberculosis and influenza. In this approach, health workers periodically select a subset of population for screening. However, given the limited number of health workers, only a small subset of the population can be visited in any given time period. Given the recurrent nature of the disease and rapid spreading, the goal is to minimize the number of infections over a long time horizon. Active screening can be formalized as a sequential combinatorial optimization over the network of people and their connections. The main computational challenges in this formalization arise from i) the combinatorial nature of the problem, ii) the need of sequential planning and iii) the uncertainties in the infectiousness states of the population. Previous works on active screening fail to scale to large time horizon while fully considering the future effect of current interventions. In this paper, we propose a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN), with several innovative adaptations that are designed to address the above challenges. First, we use graph convolutional networks (GCNs) to represent the Q-function that exploit the node correlations of the underlying contact network. Second, to avoid solving a combinatorial optimization problem in each time period, we decompose the node set selection as a sub-sequence of decisions, and further design a two-level RL framework that solves the problem in a hierarchical way. Finally, to speed-up the slow convergence of RL which arises from reward sparseness, we incorporate ideas from curriculum learning into our hierarchical RL approach. We evaluate our RL algorithm on several real-world networks. Results show that our RL algorithm can scale up to 10 times the problem size of state-of-the-art (the variant that considers the effect of future interventions but un-scalable) in terms of planning time horizon. Meanwhile, it outperforms state-of-theart (the variant that scales up but does not consider the effect of future interventions) by up to 33% in solution quality.

AAAI Conference 2021 Conference Paper

Aggregated Multi-GANs for Controlled 3D Human Motion Prediction

Zhenguang Liu
Kedi Lyu
Shuang Wu
Haipeng Chen
Yanbin Hao
Shouling Ji

Human motion prediction from historical pose sequence is at the core of many applications in machine intelligence. However, in current state-of-the-art methods, the predicted future motion is confined within the same activity. One can neither generate predictions that differ from the current activity, nor manipulate the body parts to explore various future possibilities. Undoubtedly, this greatly limits the usefulness and applicability of motion prediction. In this paper, we propose a generalization of the human motion prediction task in which control parameters can be readily incorporated to adjust the forecasted motion. Our method is compelling in that it enables manipulable motion prediction across activity types and allows customization of the human movement in a variety of fine-grained ways. To this aim, a simple yet effective composite GAN structure, consisting of local GANs for different body parts and aggregated via a global GAN is presented. The local GANs game in lower dimensions, while the global GAN adjusts in high dimensional space to avoid mode collapse. Extensive experiments show that our method outperforms state-of-the-art. The codes are available at https: //github. com/herolvkd/AM-GAN.

AAAI Conference 2021 Conference Paper

EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet Allocation

Qi Zhou
Haipeng Chen
Yitao Zheng
Zhen Wang

As one of the most powerful topic models, Latent Dirichlet Allocation (LDA) has been used in a vast range of tasks, including document understanding, information retrieval and peer-reviewer assignment. Despite its tremendous popularity, the security of LDA has rarely been studied. This poses severe risks to security-critical tasks such as sentiment analysis and peer-reviewer assignment that are based on LDA. In this paper, we are interested in knowing whether LDA models are vulnerable to adversarial perturbations of benign document examples during inference time. We formalize the evasion attack to LDA models as an optimization problem and prove it to be NP-hard. We then propose a novel and efficient algorithm, EvaLDA to solve it. We show the effectiveness of EvaLDA via extensive empirical evaluations. For instance, in the NIPS dataset, EvaLDA can averagely promote the rank of a target topic from 10 to around 7 by only replacing 1% of the words with similar words in a victim document. Our work provides significant insights into the power and limitations of evasion attacks to LDA models.

NeurIPS Conference 2021 Conference Paper

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning

Kai Wang
Sanket Shah
Haipeng Chen
Andrew Perrault
Finale Doshi-Velez
Milind Tambe

In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman-based and policy gradient-based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.

IJCAI Conference 2019 Conference Paper

Dynamic Electronic Toll Collection via Multi-Agent Deep Reinforcement Learning with Edge-Based Graph Convolutional Networks

Wei Qiu
Haipeng Chen
Bo An

Over the past decades, Electronic Toll Collection (ETC) systems have been proved the capability of alleviating traffic congestion in urban areas. Dynamic Electronic Toll Collection (DETC) was recently proposed to further improve the efficiency of ETC, where tolls are dynamically set based on traffic dynamics. However, computing the optimal DETC scheme is computationally difficult and existing approaches are limited to small scale or partial road networks, which significantly restricts the adoption of DETC. To this end, we propose a novel multi-agent reinforcement learning (RL) approach for DETC. We make several key contributions: i) an enhancement over the state-of-the-art RL-based method with a deep neural network representation of the policy and value functions and a temporal difference learning framework to accelerate the update of target values, ii) a novel edge-based graph convolutional neural network (eGCN) to extract the spatio-temporal correlations of the road network state features, iii) a novel cooperative multi-agent reinforcement learning (MARL) which divides the whole road network into partitions according to their geographic and economic characteristics and trains a tolling agent for each partition. Experimental results show that our approach can scale up to realistic-sized problems with robust performance and significantly outperform the state-of-the-art method.

IJCAI Conference 2019 Conference Paper

FakeTables: Using GANs to Generate Functional Dependency Preserving Tables with Bounded Real Data

Haipeng Chen
Sushil Jajodia
Jing Liu
Noseong Park
Vadim Sokolov
V. S. Subrahmanian

In many cases, an organization wishes to release some data, but is restricted in the amount of data to be released due to legal, privacy and other concerns. For instance, the US Census Bureau releases only 1% of its table of records every year, along with statistics about the entire table. However, the machine learning (ML) models trained on the released sub-table are usually sub-optimal. In this paper, our goal is to find a way to augment the sub-table by generating a synthetic table from the released sub-table, under the constraints that the generated synthetic table (i) has similar statistics as the entire table, and (ii) preserves the functional dependencies of the released sub-table. We propose a novel generative adversarial network framework called ITS-GAN, where both the generator and the discriminator are specifically designed to satisfy these two constraints. By evaluating the augmentation performance of ITS-GAN on two representative datasets, the US Census Bureau data and US Bureau of Transportation Statistics (BTS) data, we show that ITS-GAN yields high quality classification results, and significantly outperforms various state-of-the-art data augmentation approaches.

IJCAI Conference 2019 Conference Paper

VEST: A System for Vulnerability Exploit Scoring & Timing

Haipeng Chen
Jing Liu
Rui Liu
Noseong Park
V. S. Subrahmanian

Knowing if/when a cyber-vulnerability will be exploited and how severe the vulnerability is can help enterprise security officers (ESOs) come up with appropriate patching schedules. Today, this ability is severely compromised: our study of data from Mitre and NIST shows that on average there is a 132 day gap between the announcement of a vulnerability by Mitre and the time NIST provides an analysis with severity score estimates and 8 important severity attributes. Many attacks happen during this very 132-day window. We present Vulnerability Exploit Scoring \& Timing (VEST), a system for (early) prediction and visualization of if/when a vulnerability will be exploited, and its estimated severity attributes and score.

AAAI Conference 2018 Conference Paper

DyETC: Dynamic Electronic Toll Collection for Traffic Congestion Alleviation

Haipeng Chen
Bo An
Guni Sharon
Josiah Hanna
Peter Stone
Chunyan Miao
Yeng Soh

To alleviate trafﬁc congestion in urban areas, electronic toll collection (ETC) systems are deployed all over the world. Despite the merits, tolls are usually pre-determined and ﬁxed from day to day, which fail to consider trafﬁc dynamics and thus have limited regulation effect when trafﬁc conditions are abnormal. In this paper, we propose a novel dynamic ETC (DyETC) scheme which adjusts tolls to trafﬁc conditions in realtime. The DyETC problem is formulated as a Markov decision process (MDP), the solution of which is very challenging due to its 1) multi-dimensional state space, 2) multidimensional, continuous and bounded action space, and 3) time-dependent state and action values. Due to the complexity of the formulated MDP, existing methods cannot be applied to our problem. Therefore, we develop a novel algorithm, PG-β, which makes three improvements to traditional policy gradient method by proposing 1) time-dependent value and policy functions, 2) Beta distribution policy function and 3) state abstraction. Experimental results show that, compared with existing ETC schemes, DyETC increases trafﬁc volume by around 8%, and reduces travel time by around 14. 6% during rush hour. Considering the total trafﬁc volume in a trafﬁc network, this contributes to a substantial increase to social welfare.

AAAI Conference 2018 Conference Paper

HogRider: Champion Agent of Microsoft Malmo Collaborative AI Challenge

Yanhai Xiong
Haipeng Chen
Mengchen Zhao
Bo An

It has been an open challenge for self-interested agents to make optimal sequential decisions in complex multiagent systems, where agents might achieve higher utility via collaboration. The Microsoft Malmo Collaborative AI Challenge (MCAC), which is designed to encourage research relating to various problems in Collaborative AI, takes the form of a Minecraft mini-game where players might work together to catch a pig or deviate from cooperation, for pursuing high scores to win the challenge. Various characteristics, such as complex interactions among agents, uncertainties, sequential decision making and limited learning trials all make it extremely challenging to ﬁnd effective strategies. We present HogRider - the champion agent of MCAC in 2017 out of 81 teams from 26 countries. One key innovation of HogRider is a generalized agent type hypothesis framework to identify the behavior model of the other agents, which is demonstrated to be robust to observation uncertainty. On top of that, a second key innovation is a novel Q-learning approach to learn effective policies against each type of the collaborating agents. Various ideas are proposed to adapt traditional Q-learning to handle complexities in the challenge, including state-action abstraction to reduce problem scale, a warm start approach using human reasoning for addressing limited learning trials, and an active greedy strategy to balance exploitationexploration. Challenge results show that HogRider outperforms all the other teams by a signiﬁcant edge, in terms of both optimality and stability.

TIST Journal 2017 Journal Article

Data-Driven Frequency-Based Airline Profit Maximization

Bo An
Haipeng Chen
Noseong Park
V. S. Subrahmanian

Although numerous traditional models predict market share and demand along airline routes, the prediction of existing models is not precise enough, and to the best of our knowledge, there is no use of data mining--based forecasting techniques for improving airline profitability. We propose the maximizing airline profits (MAP) architecture designed to help airlines and make two key contributions in airline market share and route demand prediction and prediction-based airline profit optimization. Compared to past methods used to forecast market share and demand along airline routes, we introduce a novel ensemble forecasting (MAP-EF) approach considering two new classes of features: (i) features derived from clusters of similar routes and (ii) features based on equilibrium pricing. We show that MAP-EF achieves much better Pearson correlation coefficients (greater than 0.95 vs. 0.82 for market share, 0.98 vs. 0.77 for demand) and R 2 -values compared to three state-of-the-art works for forecasting market share and demand while showing much lower variance. Using the results of MAP-EF, we develop MAP--bilevel branch and bound (MAP-BBB) and MAP-greedy (MAP-G) algorithms to optimally allocate flight frequencies over multiple routes to maximize an airline’s profit. We also study two extensions of the profit maximization problem considering frequency constraints and long-term profits. Furthermore, we develop algorithms for computing Nash equilibrium frequencies when there are multiple strategic airlines. Experimental results show that airlines can increase profits by a significant margin. All experiments were conducted with data aggregated from four sources: the U.S. Bureau of Transportation Statistics (BTS), the U.S. Bureau of Economic Analysis (BEA), the National Transportation Safety Board (NTSB), and the U.S. Census Bureau (CB).